ArticleLiterature Review

Computational design of protein-protein interactions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Computational protein design strategies have been developed to reengineer protein-protein interfaces in an automated, generalizable fashion. In the past two years, these methods have been successfully applied to generate chimeric proteins and protein pairs with specificities different from naturally occurring protein-protein interactions. Although there are shortcomings in current approaches, both in the way conformational space is sampled and in the energy functions used to evaluate designed conformations, the successes suggest we are now entering an era in which computational methods can be used to modulate, reengineer and design protein-protein interaction networks in living cells.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... [1][2][3] It also includes the structures of many protein-protein complexes and multicomponent assemblies. 2 In recent years, the design of new protein structures and new protein-protein interactions to generate new proteins or complexes with a desired function has become a new research focus in protein science. [4][5][6][7] One powerful experimental tool for the stepwise adaptation of a protein sequence toward a desired function is the directed evolution method. It is based on random mutagenesis of a protein sequence or segment of a protein and subsequent selection according to a desired target function. ...
... 21,23 This method has been used in several applications to successfully redesign proteins or to create completely new stable folded proteins. 4,5,7,23 Combinations of rational design and directed evolution have also been used successfully to generate new or altered proteins. 9,24 In addition to the design of single proteins rational approaches have also been used to design stable protein-protein interactions. ...
... 9,24 In addition to the design of single proteins rational approaches have also been used to design stable protein-protein interactions. 6,7,25,26 A common method implemented in the Rosetta approach is based on identification of a putative interacting region, for example, a small putative interaction motif (Motifgraft), and the subsequent computational de novo design of a new protein partner around the motif and possibly additional interactions to the partner protein. 25,27 Such approach is, however, computational quite demanding and requires the computational evaluation of many in silico residue substitutions and further energy minimization and molecular modeling steps. ...
Article
Full-text available
In order to generate protein assemblies with a desired function the rational design of protein-protein binding interfaces is of significant interest. Approaches based on random mutagenesis or directed evolution may involve complex experimental selection procedures. Also, molecular modeling approaches to design entirely new proteins and interactions with partner molecules can involve large computational efforts and screening steps. In order to simplify at least the initial effort for designing a putative binding interface between two proteins the Match_Motif approach has been developed. It employs the large collection of known protein-protein complex structures to suggest interface modifications that may lead to improved binding for a desired input interaction geometry. The approach extracts interaction motifs based on the backbone structure of short (4 residue) segments and the relative arrangement with respect to short segments on the partner protein. The interaction geometry is used to search through a data base of such motifs in known stable bound complexes. All matches are rapidly identified (within a few seconds) and collected and can be used to guide changes in the interface that may lead to improved binding. In the output an alternative interface structure is also proposed based on the frequency of occurrence of side chains at a given interface position in all matches and based on sterical considerations. Applications of the procedure to known complex structures and alternative arrangements are presented and discussed. The program, data files and example applications can be downloaded from: https://www.groups.ph.tum.de/t38/downloads/. This article is protected by copyright. All rights reserved.
... The design of novel protein-protein and peptideprotein interfaces holds vast potential in medicinal and agricultural biologics and remains a key goal for computational molecular design. Two key challenges are faced when attempting to accomplish this goal: An enormously large search space, requiring unrealistic computation times and flawed computational prediction methods that may "miss" many good solutions [1]. Overcoming these problems is critical for computational peptide/ protein design. ...
... The Gibb's free binding energy of the E9 Dnase with its cognate immunity protein IM9 complex has been experimentally determined at -22 Kcal/mol [37], while using our software (with AMBER GB/SA included for energy evaluations) a binding energy of -20.9 Kcal/mol was calculated. Alanine scanning mutations for this complex have been reported experimentally [35] and calculated in silico, using the Rosetta Alanine scanning procedure (using the Robetta server) [38,1]. An AMBER GB/SA calculated repeat of the alanine scanning experiment by Wallis et al. [35] showed a correlation of 0.76 to the experimental alanine scanning mutagenesis of these rotamers. ...
... The resulting top models from each of the two sets show that the MC following one cycle of RAD led to a superior complex lowers its binding affinity by 2 orders of magnitude while the mutation of the aspartic acid to alanine in the non-cognate increases its binding affinity by 2 orders of magnitude, validating the importance of the D33L mutation ( Figure 5B). The second most common amino acid in position 33 was Q33; while no experimental evidence of this replacement has been reported, the Robetta alanine scanning server calculated that the mutation Q33A has a ΔΔG of 6.7 Kcal/mol [38,1], mostly due to hydrogen bonding with S71 in E9 ( Figure 5C). As in the cognate complex, the E30 salt bridge was retained in the RAD and ISE ...
Article
Full-text available
ABSTRACT While machine learning techniques have greatly increased molecular modeling capabilities, the frequent reliance on stochastic algorithms is a limiting factor due to slow optimization processes. Faster algorithms are thus sought for large scope projects. Nevertheless, stochastic algorithms also provide a distinct advantage by providing a solution ensemble rather than a single optimal solution. Producing a large ensemble of solutions is critical in problems where not all solution aspects are predictable and unpredictable properties may be key to ultimate success. Similar problems have been tackled previously before the advent of machine learning, in the field of finance. In 1952, Harry Markowitz introduced the modern portfolio theory (MPT), which uses the heuristics of risk and return to optimize a financial portfolio. In this study we will introduce an implementation of MPT heuristics in the field of protein-protein and peptide-protein interface design and show examples of its usage.
... Deciphering the details of PPIs requires the use of both physical chemistry analysis and observed interactions in experimentally determined protein complexes [10,11]. Protein interfaces have been extensively studied to develop a detailed understanding of the forces and recognition processes at a molecular level [11][12][13][14][15][16][17][18][19]. This knowledge can be expanded by accounting for mutations, which can cause proteins' conformations and interface properties to change [20,21]. ...
... Histidine is a small outlier among the polar residues, as it has an average mutation score of -2. 16. Histidine is also the first amino acid to have any mutations that are predicted by Amber to typically be beneficial (lysine) or neutral (glutamine). ...
Article
Full-text available
The inventions of AlphaFold and RoseTTAFold are revolutionizing computational protein science due to their abilities to reliably predict protein structures. Their unprecedented successes are due to the parallel consideration of several types of information, one of which is protein sequence similarity information. Sequence homology has been studied for many decades and depends on similarity matrices to define how similar or different protein sequences are to one another. A natural extension of predicting protein structures is predicting the interactions between proteins, but similarity matrices for protein-protein interactions do not exist. This study conducted a mutational analysis of 384 non-redundant antibody–protein antigen complexes to calculate antibody-protein interaction similarity matrices. Every important residue in each antibody and each antigen was mutated to each of the other 19 commonly occurring amino acids and the percentage changes in interaction energies were calculated using three force fields: CHARMM, Amber, and Rosetta. The data were used to construct six interaction similarity matrices, one for antibodies and another for antigens using each force field. The matrices exhibited both commonalities, such as mutations of aromatic and charged residues being the most detrimental, and differences, such as Rosetta predicting mutations of serines to be better tolerated than either Amber or CHARMM. A comparison to nine previously published similarity matrices for protein sequences revealed that the new interaction matrices are more similar to one another than they are to any of the previous matrices. The created similarity matrices can be used in force field specific applications to help guide decisions regarding mutations in protein-protein binding interfaces.
... 3 While empirical sequence design of non-natural de novo biomolecules based on DNA, 4 lipids 5 and proteins 6 to construct non-natural biomaterials has been widely successful, introduction of the computational toolbox for defining biomolecule design has added an exciting new dimension to material discovery while also enabling tunability of structure and function at the nanoscale in a systematic manner. Consequently, in silico computational design tools are contributing significantly to the flourishing science of DNA origami 7,8 and to supramolecular protein assembly, 9 especially via protein-protein interface modification 10 and docking algorithms. [11][12][13] However, much work remains in terms of sequence-based design and structure predictions for peptide and protein folding and assembly, which is complicated by multiple local interactions, large numbers of potential conformations, and sequence-dependent structural features, the sum of which is commonly referred to as the 'protein folding problem'. ...
... The sample mean and standard error in radius, R and fractal dimension, D are reported here. The effective diameter, D eff and its uncertainty (AE1s) are calculated via eqn (10) and error propagation respectively (see ESI) acidic pH conditions for the long rigid rod samples, which we have reported elsewhere (data not shown). 28 Overall, a balance of the magnitude of net charge and the balance of opposite charges along the rod length combined with average rod length and distributions inhibits lyotropic liquid crystal formation here. ...
Article
Short α-helical peptides were computationally designed to self-assemble into robust coiled coils that are antiparallel, homotetrameric bundles. These peptide bundle units, or ‘bundlemers’, have been utilized as anisotropic building blocks to construct bundlemer-based polymers via a hierarchical, hybrid physical-covalent assembly pathway. The bundlemer chains were constructed using short linker connections via ‘click’ chemistry reactions between the N-termini of bundlemer constituent peptides. The resulting bundlemer chains appear as extremely rigid, cylindrical rods in transmission electron microscope (TEM) images. Small angle neutron scattering (SANS) shows that these bundlemer chains exist as individual rods in solution with a cross-section that is equal to that of a single coiled coil bundlemer building block of ≈ 20 Å. SANS further confirms that the interparticle solution structure of the rigid rod bundlemer chains is heterogeneous and responsive to solution conditions, such as ionic-strength and pH. Due to their peptidic constitution, the bundlemer assemblies behave like polyelectrolytes that carry an average charge density of approximately 3 charges per bundlemer as determined from SANS structure factor data fitting, which describes the repulsion between charged rods in solution. This repulsion manifests as a correlation hole in the scattering profile that is suppressed by dilution or addition of salt. Presence of rod cluster aggregates with a mass fractal dimension of ≈ 2.5 is also confirmed across all samples. The formation of such dense, fractal-like cluster aggregates in a solution of net repulsive rods is a unique example of the subtle balance between short-range attraction and long-rage repulsion interactions in proteins and other biomaterials. With computational control of constituent peptide sequences, it is further possible to deconvolute the underlying sequence driven structure-property relationships in the modular bundlemer chains.
... In recent years, the possibility to design new synthetic protein-protein complexes with a desired function has gained significant momentum. 7,8 One goal is to modify existing natural proteins in such a way that the geometry and affinity of a known protein-protein interaction may change or the interaction with a different protein surface on another protein partner becomes possible. On the longer run, it is desired to create completely new protein partners with programmed surface properties to allow for new interactions with selected candidate partners. ...
... 15,16,25,35 In addition, the de novo design of protein-protein interactions requires docking or modeling of new protein-protein interfaces. 7 Second, for the majority of natural stable protein-protein interactions one often finds homologous complex geometries in the data base of experimentally determined complexes. 36 Hence, many interactions especially those mediated by reoccurring protein domains can be modeled based on similarity to an already known interaction. ...
Article
Full-text available
Protein–protein interactions form central elements of almost all cellular processes. Knowledge of the structure of protein–protein complexes but also of the binding affinity is of major importance to understand the biological function of protein–protein interactions. Even weak transient protein–protein interactions can be of functional relevance for the cell during signal transduction or regulation of metabolism. The structure of a growing number of protein–protein complexes has been solved in recent years. Combined with docking approaches or template‐based methods, it is possible to generate structural models of many putative protein–protein complexes or to design new protein–protein interactions. In order to evaluate the functional relevance of putative or predicted protein–protein complexes, realistic binding affinity prediction is of increasing importance. Several computational tools ranging from simple force‐field or knowledge‐based scoring of single protein–protein complexes to ensemble‐based approaches and rigorous binding free energy simulations are available to predict relative and absolute binding affinities of complexes. With a focus on molecular mechanics force‐field approaches the present review aims at presenting an overview on available methods and discussing advantages, approximations, and limitations of the various methods. This article is categorized under: Molecular and Statistical Mechanics > Molecular Interactions Molecular and Statistical Mechanics > Free Energy Methods Software > Molecular Modeling
... Due to the importance of protein-protein interactions in myriad cellular processes, much effort has been invested in the development of methods to redesign interacting pairs for desired affinity and specificity, and even to design entirely new partners. Such methods typically focus on improving affinity (Kastritis and Bonvin, 2012), and have driven a wide range of applications (Kortemme and Baker, 2004;Schreiber and Fleishman, 2013), including improvement of antibody binding affinities (Kuroda et al., 2012;Lippow and Tidor, 2007;Lippow et al., 2007), design of inhibitors against infectious i245 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. ...
... Optimizing the affinity of interacting proteins (for better or worse) requires predicting the effects of mutations on binding. A wide range of approaches to this problem have been pursued, including all-atom molecular dynamics (Deng and Roux, 2009;Moretti et al., 2013;Weis et al., 2006), empirically derived physical potentials (Brender and Zhang, 2015;Guerois et al., 2002;Kortemme and Baker, 2004), statistical contact potentials (Pons et al., 2011;Tharakaraman et al., 2013;Vangone and Bonvin, 2015), machine learning methods combining multiple such features (Dehouck et al., 2013;Wang et al., 2012) and target-specific datadriven models (Kamisetty et al., 2015;Nielsen et al., 2008;Thomas et al., 2009). The curation of extensive databases of experimentally measured binding free energies (Moal and Ferná ndez-Recio, 2012;Sirin et al., 2016) has recently enabled evaluation of the predictive ability of some such methods (i.e. ...
Article
Full-text available
Motivation: Disruption of protein-protein interactions can mitigate antibody recognition of therapeutic proteins, yield monomeric forms of oligomeric proteins, and elucidate signaling mechanisms, among other applications. While designing affinity-enhancing mutations remains generally quite challenging, both statistically and physically based computational methods can precisely identify affinity-reducing mutations. In order to leverage this ability to design variants of a target protein with disrupted interactions, we developed the DisruPPI protein design method (DISRUpting Protein-Protein Interactions) to optimize combinations of mutations simultaneously for both disruption and stability, so that incorporated disruptive mutations do not inadvertently affect the target protein adversely. Results: Two existing methods for predicting mutational effects on binding, FoldX and INT5, were demonstrated to be quite precise in selecting disruptive mutations from the SKEMPI and AB-Bind databases of experimentally determined changes in binding free energy. DisruPPI was implemented to use an INT5-based disruption score integrated with an AMBER-based stability assessment and was applied to disrupt protein interactions in a set of different targets representing diverse applications. In retrospective evaluation with three different case studies, comparison of DisruPPI-designed variants to published experimental data showed that DisruPPI was able to identify more diverse interaction-disrupting and stability-preserving variants more efficiently and effectively than previous approaches. In prospective application to an interaction between enhanced green fluorescent protein (EGFP) and a nanobody, DisruPPI was used to design five EGFP variants, all of which were shown to have significantly reduced nanobody binding while maintaining function and thermostability. This demonstrates that DisruPPI may be readily utilized for effective removal of known epitopes of therapeutically relevant proteins. Availability and implementation: DisruPPI is implemented in the EpiSweep package, freely available under an academic use license. Supplementary information: Supplementary data are available at Bioinformatics online.
... The field of CPD has been reviewed in the frame of the general methodology [1][2][3][4][5][6][7][8] as well as specific methodological aspects such as library-scale CPD [9], multistate approaches and backbone flexibility [10][11][12][13], electrostatics [14], fragment databases [15], and energy landscapes [16,17]. Specific CPD applications and protein family targets have been reviewed such as protein therapeutics [18], ligand binding and enzyme catalysis [19][20][21][22], binding specificity [23,24], membrane proteins [25,26], metalloproteins [27], collagens [28], conformational switches [29], and protein-protein interactions [30]. Numerous other aspects are presented as part of this very book which is the first book with this title. ...
... 6. Protein-protein interactions (PPI)-While a binding site is often specifically designed for a non-amino acid moiety, protein-protein interaction CPD include the stable or transient interaction between spatial patches of amino acids that are on the surface-accessible part of the protein [30]. Numerous casestudies of PPI CPD were applied with altered specificity [61,62] and affinity [63]. ...
Chapter
Full-text available
Computational protein design (CPD) has established itself as a leading field in basic and applied science with a strong coupling between the two. Proteins are computationally designed from the level of amino acids to the level of a functional protein complex. Design targets range from increased thermo- (or other) stability to specific requested reactions such as protein–protein binding, enzymatic reactions, or nanotechnology applications. The design scheme may encompass small regions of the proteins or the entire protein. In either case, the design may aim at the side-chains or at the full backbone conformation. Herein, the main framework for the process is outlined highlighting key elements in the CPD iterative cycle. These include the very definition of CPD, the diverse goals of CPD, components of the CPD protocol, methods for searching sequence and structure space, scoring functions, and augmenting the CPD with other optimization tools. Taken together, this chapter aims to introduce the framework of CPD.
... The Robetta algorithm automatically mutated each of the residues individually with alanine to calculate the difference in free energy of the binding complex. [22] Column 5 covers the significant ΔΔG results computed upon alanine mutation by Robetta-AlaScan. Predicted changes in protein stability of the main mutated complex partner are also given as a ΔG (partner) upon alanine mutation by Robetta [ Column 6]. ...
Article
Full-text available
Purpose: RNYK is a selective agonist of the neurotrophic tyrosine kinase receptor type 2 (NTRK2) which has been screened from a phage-displayed peptide library. Its sequence is SGVYKVAYDWQH, similar to a native NTRK2 ligand, that is, brain-derived neurotrophic factor (BDNF). The current study was performed to recognize and confirm critical residues for RNYK activity in a glaucoma-on-a-chip model. Methods: We designed a modified RNYK (mRNYK) peptide based on hotspots of the RNYK sequence identified by alanine scanning. The critical residues consisted of tyrosine, valine, aspartic acid, and tryptophan (YVDW); however, lysine and glutamine were also maintained in the final sequence (YKVDWQ) for forming amide bonds and peptide dimerization. The affinity of mRNYK binding was confirmed by testing against NTRK2 receptors on the surface of ATRA-treated SH-SY5Y cells. The neuroprotective effect of mRNYK was also evaluated in cell culture after elevated pressure insult in a glaucoma-on-a-chip model. Results: The primary amine on the lysine side-chain from one sequence (YKVDWQ) reacted with a γ-carboxamide group of glutamine from the other sequence, forming dimeric mRNYK. In silico, molecular dynamic simulations of the mRNYK-NTRK2 complex showed more stable and stronger interactions as compared to the RNYK-NTRK2 complex. In vitro, mRNYK demonstrated a neuroprotective effect on SH-SY5Y cells under normal and elevated pressure comparable to RNYK. The 50% effective concentration (logEC50) for mRNYK was 0.7009, which was better than RNYK with a logEC50 of 0.8318. Conclusion: The modified peptide studied herein showed improved stability over the original peptide (RNYK) and demonstrated potential for use as a BDNF agonist with neuroprotective properties for treatment of neurodegenerative disorders such as glaucoma.
... De novo-designed binders that can be created using computational methods against almost any protein with known structure are a tantalizing new class of proteins that could serve as antigen sensors for synthetic receptors. Computational design of protein-protein interactions has a long history 33,34 but recent advances have made this technology readily accessible for synthetic biology applications. Protein docking-based methods have enabled the design of binders against any protein structure 35 . ...
Preprint
Full-text available
Synthetic and chimeric receptors capable of recognizing and responding to user-defined antigens have enabled "smart" therapeutics based on engineered cells. These cell engineering tools depend on antigen sensors which are most often derived from antibodies. Advances in the de novo design of proteins have enabled the design of protein binders with the potential to target epitopes with unique properties and faster production timelines compared to antibodies. Building upon our previous work combining a de novo-designed minibinder of the Spike protein of SARS-CoV-2 with the synthetic receptor synNotch (SARSNotch), we investigated whether minibinders can be readily adapted to a diversity of cell engineering tools. We show that the Spike minibinder LCB1 easily generalizes to a next-generation proteolytic receptor SNIPR that performs similarly to our previously reported SARSNotch. LCB1-SNIPR successfully enables the detection of live SARS-CoV-2, an improvement over SARSNotch which can only detect cell-expressed Spike. To test the generalizability of minibinders to diverse applications, we tested LCB1 as an antigen sensor for a chimeric antigen receptor (CAR). LCB1-CAR enabled CD8+ T cells to cytotoxically target Spike-expressing cells. Our findings suggest that minibinders represent a novel class of antigen sensors that have the potential to dramatically expand the sensing repertoire of cell engineering tools.
... Protein-protein interactions (PPI) play a key role in many biological processes. Understanding the mechanisms and functions of PPIs may benefit many areas such as drug discovery (Scott et al. 2016;Athanasios et al. 2017;Macalino et al. 2018) and protein design (Kortemme and Baker 2004;Baker 2006;Lippow and Tidor 2007). Typically, high-resolution 3D structures of protein complexes can be determined using experimental solutions (e.g. ...
Article
Full-text available
Motivation: Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. Results: In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. Availability and implementation: The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
... 592−594 The design of these hydrogen bonds was first explored to promote protein−protein interactions, which was less demanding due to the presence of water molecules and solvent exposures. 595 Joachimiak et al. reported the design of DNaseimmunity protein and new binding partners with affinity 300fold higher to unspecific bindings, by identifying the critical interface residues and sampling alternate rigid body orientations to obtain iterative structure-based mutations. 596 However, a later work by Stranges and Kuhlman suggested the limitations of this approach by comparing five successful cases against 158 failures, both sets of which were designed by Rosetta. ...
Article
Full-text available
Water solubility and structural stability are key merits for proteins defined by the primary sequence and 3D-conformation. Their manipulation represents important aspects of the protein design field that relies on the accurate placement of amino acids and molecular interactions, guided by underlying physiochemical principles. Emulated designer proteins with well-defined properties both fuel the knowledge-base for more precise computational design models and are used in various biomedical and nanotechnological applications. The continuous developments in protein science, increasing computing power, new algorithms, and characterization techniques provide sophisticated toolkits for solubility design beyond guess work. In this review, we summarize recent advances in the protein design field with respect to water solubility and structural stability. After introducing fundamental design rules, we discuss the transmembrane protein solubilization and de novo transmembrane protein design. Traditional strategies to enhance protein solubility and structural stability are introduced. The designs of stable protein complexes and high-order assemblies are covered. Computational methodologies behind these endeavors, including structure prediction programs, machine learning algorithms, and specialty software dedicated to the evaluation of protein solubility and aggregation, are discussed. The findings and opportunities for Cryo-EM are presented. This review provides an overview of significant progress and prospects in accurate protein design for solubility and stability.
... Protein-protein interactions (PPI) play a key role in almost all biological processes. Understanding the mechanisms and functions of PPI may benefit studies in other scientific areas such as drug discovery [1,2,3] and protein design [4,5,6]. Typically, high-resolution 3D structures of protein complexes can be determined using experimental solutions (e.g., X-ray crystallography and cryo-electron microscopy). ...
Preprint
Full-text available
Motivation Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. Results In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. Availability The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA Contact chengji@missouri.edu Supplementary information Supplementary data are available at Bioinformatics online.
... The de novo design of the primary sequence of artificial proteins has been studied in the last four decades [2][3][4] , and recently has also been described as design from scratch [5][6][7][8][9][10][11] . In early studies, the secondary structures of proteins-α-helix and β-sheet structureswere created synthetically by peptide chemistry 12 , with these secondary structures subsequently connected through a loop sequence to construct the more complicated three-dimensional structure 13 . ...
Article
Full-text available
The amino-acid sequence of a protein encodes information on its three-dimensional structure and specific functionality. De novo design has emerged as a method to manipulate the primary structure for the development of artificial proteins and peptides with desired functionality. This paper describes the de novo design of a pore-forming peptide, named SV28, that has a β-hairpin structure and assembles to form a stable nanopore in a bilayer lipid membrane. This large synthetic nanopore is an entirely artificial device for practical applications. The peptide forms multidispersely sized nanopore structures ranging from 1.7 to 6.3 nm in diameter and can detect DNAs. To form a monodispersely sized nanopore, we redesigned the SV28 by introducing a glycine-kink mutation. The resulting redesigned peptide forms a monodisperse pore with a diameter of 1.7 nm leading to detection of a single polypeptide chain. Such de novo design of a β-hairpin peptide has the potential to create artificial nanopores, which can be size adjusted to a target molecule.
... Today, computational methods, and the increasing importance of structural data, allow the design of fully synthetic and orthogonal PPI-scaffolds and linkers which can be employed for protein circuits design [20,21]. A wide range of nature-inspired (natural PPIs) or fully synthetic scaffolds have been developed so far. ...
Article
Full-text available
Protein-protein interactions (PPIs) contribute to regulate many aspects of cell physiology and metabolism. Protein domains involved in PPIs are important building blocks for engineering genetic circuits through synthetic biology. These domains can be obtained from known proteins and rationally engineered to produce orthogonal scaffolds, or computationally designed de novo thanks to recent advances in structural biology and molecular dynamics prediction. Such circuits based on PPIs (or protein circuits) appear of particular interest, as they can directly affect transcriptional outputs, as well as induce behavioral/adaptational changes in cell metabolism, without the need for further protein synthesis. This last example was highlighted in recent works to enable the production of fast-responding circuits which can be exploited for biosensing and diagnostics. Notably, PPIs can also be engineered to develop new drugs able to bind specific intra- and extra-cellular targets. In this review, we summarize recent findings in the field of protein circuit design, with particular focus on the use of peptides as scaffolds to engineer these circuits.
... During evolution, proteins have acquired self-assembly properties to construct a variety of large, complex, and symmetric architectures such as one-dimensional (1D) actin filaments 8 , two-dimensional (2D) bacterial surface layers (S-layers) 9 , and three-dimensional (3D) light-harvesting protein complexes of phycobilisomes 10 , thereby endowing their hosts with plenty of functions. It is well known that protein-protein interactions (PPIs) at protein interfaces are the chief contributors to construct the diversified protein nanostructures [11][12][13] . Following Nature's inspiration to assemble protein building blocks into exquisite nanostructures, various self-assembly strategies, such as symmetry-directed design [14][15][16][17] , metal coordination 7,18-20 , host-guest interactions 21,22 , and the use of bifunctional ligands 23,24 , have been applied to construct 1D, 2D, and 3D hierarchical protein nanostructures. ...
Article
Full-text available
Although various artificial protein nanoarchitectures have been constructed, controlling the transformation between different protein assemblies has largely been unexplored. Here, we describe an approach to realize the self-assembly transformation of dimeric building blocks by adjusting their geometric arrangement. Thermotoga maritima ferritin (TmFtn) naturally occurs as a dimer; twelve of these dimers interact with each other in a head-to-side manner to generate 24-meric hollow protein nanocage in the presence of Ca ²⁺ or PEG. By tuning two contiguous dimeric proteins to interact in a fully or partially side-by-side fashion through protein interface redesign, we can render the self-assembly transformation of such dimeric building blocks from the protein nanocage to filament, nanorod and nanoribbon in response to multiple external stimuli. We show similar dimeric protein building blocks can generate three kinds of protein materials in a manner that highly resembles natural pentamer building blocks from viral capsids that form different protein assemblies.
... During evolution, proteins have acquired self-assembly properties to construct a variety of large, complex, and symmetric architectures such as 1D actin laments 8 , 2D bacterial surface layers (Slayers) 9 , and 3D light-harvesting protein complexes of phycobilisomes 10 , thereby endowing their hosts with plenty of functions. It is well-known that protein-protein interactions (PPIs) at protein interfaces are the chief contributors to construct the diversi ed protein nanostructures [11][12][13] . Following Nature's inspiration to assemble protein building blocks into exquisite nanostructures, various self-assembly strategies, such as symmetry-directed design [14][15][16][17] , metal-coordination 7,18,19 , host-guest interactions 20,21 and the use of bifunctional ligands 22,23 have been applied mainly to construct one-, two-and threedimensional hierarchical protein nanostructures. ...
Preprint
Full-text available
Although various artificial protein nanoarchitectures have been constructed, controlling conversion between protein assemblies with different dimensions has largely been unexplored. Here, we describe a simple, effective approach to regulate conversion between 0D protein nanomaterials and their 1D or 2D analogues by adjusting the geometric arrangement of dimeric protein building blocks. Thermotoga maritima ferritin (TmFtn) naturally occurs as a dimeric protein, twelve of which interact with each other in a head-to-side manner to generate 0D 24-meric protein nanocage in the presence of Ca ²⁺ . By tuning two contiguous dimeric proteins to interact in a fully or partially side-by-side fashion through protein interface redesign, we can render the conversion of the inherent salt-mediated 0D protein nanocage into 1D or 2D nanomaterials in response to multiple external stimuli. Thus, one kind of dimeric protein building block can generate three protein materials with different dimensions in a manner that highly resembles natural pentamer building blocks from viral capsids that form different protein assemblies.
... Many biological functions depend on specific protein-protein interactions. Protein engineering offers the possibility to tune these interactions by developing de novo binding partners [1][2][3] or by mutating the interaction partners using computational design [4,5]. A powerful tool of protein engineering is to generate protein binders by the in vitro directed selection techniques [6][7][8] or to use evolution-based approaches to increase the stability of recombinant proteins [9][10][11]. ...
Article
Full-text available
Engineered small non-antibody protein scaffolds are a promising alternative to antibodies and are especially attractive for use in protein therapeutics and diagnostics. The advantages include smaller size and a more robust, single-domain structural framework with a defined binding surface amenable to mutation. This calls for a more systematic approach in designing new scaffolds suitable for use in one or more methods of directed evolution. We hereby describe a process based on an analysis of protein structures from the Protein Data Bank and their experimental examination. The candidate protein scaffolds were subjected to a thorough screening including computational evaluation of the mutability, and experimental determination of their expression yield in E. coli, solubility, and thermostability. In the next step, we examined several variants of the candidate scaffolds including their wild types and alanine mutants. We proved the applicability of this systematic procedure by selecting a monomeric single-domain human protein with a fold different from previously known scaffolds. The newly developed scaffold, called ProBi (Protein Binder), contains two independently mutable surface patches. We demonstrated its functionality by training it as a binder against human interleukin-10, a medically important cytokine. The procedure yielded scaffold-related variants with nanomolar affinity.
... Numerous theoretical simulation methods, such as Z-Dock, Rosetta, Amber, and GRAMM-X, have been developed to guide the de novo design of the interfacial properties and recognition behaviors of protein building blocks with atomic resolution (Kortemme and Baker, 2004;Rodrigues and Bonvin, 2014). ...
Article
Full-text available
Diverse natural/artificial proteins have been used as building blocks to construct a variety of well-ordered nanoscale structures over the past couple of decades. Sophisticated protein self-assemblies have attracted great scientific interests due to their potential applications in disease diagnosis, illness treatment, biomechanics, bio-optics and bio-electronics, etc. This review outlines recent efforts directed to the creation of structurally defined protein assemblies including one-dimensional (1D) strings/rings/tubules, two-dimensional (2D) planar sheets and three-dimensional (3D) polyhedral scaffolds. We elucidate various innovative strategies for manipulating proteins to self-assemble into desired architectures. The emergent applications of protein assemblies as versatile platforms in medicine and material science with improved performances have also been discussed.
... It is also reported that inter-protein contact prediction is useful for the 3D structure modeling of a PPI or protein docking [16,25,26]. A few computational methods have been developed to predict protein-level interaction, including phylogenetic profiling [27], force field methods [28], genomic co-localization [29,30] and others [31][32][33]. Nevertheless, these methods can only predict if two proteins interact or not, but not which residue pairs may interact or form a contact. ...
Preprint
Full-text available
Intra-protein residue-level contact prediction has drawn a lot of attentions in recent years and made very good progress, but much fewer methods are dedicated to inter-protein contact prediction, which are important for understanding how proteins interact at structure and residue level. Direct coupling analysis (DCA) is popular for intra-protein contact prediction, but extending it to inter-protein contact prediction is challenging since it requires too many interlogs (i.e., interacting homologs) to be effective, which cannot be easily fulfilled especially for a putative interacting protein pair in eukaryotes. We show that deep learning, even trained by only intra-protein contact maps, works much better than DCA for inter-protein contact prediction. We also show that a phylogeny-based method can generate a better multiple sequence alignment for eukaryotes than existing genome-based methods and thus, lead to better inter-protein contact prediction. Our method shall be useful for protein docking, protein interaction prediction and protein interaction network construction.
... The advent of computational protein evolution (also known as protein design) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] opens the possibility to address fundamental questions about the nature of the amino acid alphabet [17][18][19][20] . Protein design consists in searching for protein sequences capable of folding into a given backbone conformation. ...
Article
Full-text available
Isolating the properties of proteins that allow them to convert sequence into the structure is a long-lasting biophysical problem. In particular, studies focused extensively on the effect of a reduced alphabet size on the folding properties. However, the natural alphabet is a compromise between versatility and optimisation of the available resources. Here, for the first time, we include the impact of the relative availability of the amino acids to extract from the 20 letters the core necessary for protein stability. We present a computational protein design scheme that involves the competition for resources between a protein and a potential interaction partner that, additionally, gives us the chance to investigate the effect of the reduced alphabet on protein-protein interactions. We devise a scheme that automatically identifies the optimal reduced set of letters for the design of the protein, and we observe that even alphabets reduced down to 4 letters allow for single protein folding. However, it is only with 6 letters that we achieve optimal folding, thus recovering experimental observations. Additionally, we notice that the binding between the protein and a potential interaction partner could not be avoided with the investigated reduced alphabets. Therefore, we suggest that aggregation could have been a driving force in the evolution of the large protein alphabet.
... Such a huge letter in order provide for proteins the huge assortment of setups and capacities that we know up until now. The approach of fake protein development (otherwise called protein plan) (Dahiyat and Mayo, 1997;Koehl and Levitt, 1999;Kortemme and Baker, 2004;Fung et al., 2008;Coluzza, 2011;Koga et al., 2012;Coluzza, 2014;Sevy et al., 2015;Bianco et al., 2017;Chevalier et al., 2017;Marcos et al., 2017;Nerattini et al., 2018) opens the likelihood to address major inquiries regarding the idea of the amino corrosive letter sets (Davidson and Sauer, 1994;Davidson et al., 1995;Cordes et al., 1996;Riddle et al., 1997). One of the inquiries that for the most part pulled in the consideration of mainstream researchers was in relation through the all-inclusiveness of 20 letters. ...
Article
Understanding the basic source of twenty letter set of proteins is a substantial biophysical and non-deterministic polynomial hard problem. It specifically concentrates on the broader effect of a decreased letter in order to estimate on the collapsing properties. In any case, common letters in order is a bargain among adaptability and enhancement of the accessible assets. In the current work, an extra effect of general accessibility is incorporated in order to show a protein configuration strategy which includes the opposition for assets between a protein and its prospective association accomplice. Moreover, it also allows for exploring the effect caused by the decreased letter set on protein-protein communications. The ideal decreased organization of letters for the plan of protein is distinguished and it was observed that the ordered letters were reduced to 4 letters taking single protein collapsing into consideration. In any case, it is only 6 letters that accomplished ideal collapsing; in this manner recouping investigations are repeated. It was also examined that the observation between the protein and a potential connection accomplice could not be maintained at a strategic distance from which this study reduced the letter sets. In this way, we recommend that accumulation could have been the main incentive for the advancement of substantial protein letters in order. Communicated by Ramaswamy H. Sarma
... Therefore, the atomic structures of protein-protein complexes are valuable to investigate the interaction mechanism and thus develop potential drugs. [13][14][15][16]. With the rapid development of structural proteomics project in the past decades, the 3D structures of many proteins have been solved and deposited in the Protein Data Bank(PDB) [17]. ...
Article
Full-text available
Background: Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open question in the development of protein-protein docking algorithms, especially for grid-based docking approaches. Results: We have proposed a new pairwise shape-based scoring function (LSC) for protein-protein docking which adopts an exponential form to take into account long-range interactions between protein atoms. The LSC scoring function was incorporated into our FFT-based docking program and evaluated for both bound and unbound docking on the protein docking benchmark 4.0. It was shown that our LSC achieved a significantly better performance than four other similar docking methods, ZDOCK 2.1, MolFit/G, GRAMM, and FTDock/G, in both success rate and number of hits. When considering the top 10 predictions, LSC obtained a success rate of 51.71% and 6.82% for bound and unbound docking, respectively, compared to 42.61% and 4.55% for the second-best program ZDOCK 2.1. LSC also yielded an average of 8.38 and 3.94 hits per complex in the top 1000 predictions for bound and unbound docking, respectively, followed by 6.38 and 2.96 hits for the second-best ZDOCK 2.1. Conclusions: The present LSC method will not only provide an initial-stage docking approach for post-docking processes but also have a general implementation for accurate representation of other energy terms on grids in protein-protein docking. The software has been implemented in our HDOCK web server at http://hdock.phys.hust.edu.cn/.
... Hence, despite single point mutations might not conserve protein stability, multiple alterations must occur simultaneously among interacting residues. [18,19] Co-evolution events could involve residues that are crucial for the protein activity (e. g. catalytic site residues), for the stability of the native structure (e. g. hydrophobic core residues) or, in some cases, for both. In other words, two functional residues participating in the ligand binding site of an enzyme must co-evolve to keep the efficiency of the catalytic reaction high, while two structural residues in the core of a protein or at the interface between two binding proteins cannot evolve independently without negatively affecting the protein stability or the binding affinity. ...
Article
Full-text available
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co‐evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co‐evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co‐evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.
... Determining protein interacting partners experimentally is both, time-consuming and expensive. Also, given the magnitude of the problem, where estimates say there could be upto 130,000 PPIs among the Human proteome alone (1)(2)(3), it is imperative that we develop computational techniques to predict such interactions (4)(5)(6). ...
Article
Full-text available
Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein-protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.
... The advent of artificial protein evolution (also known as protein design) (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16) opens the possibility to address fundamental questions about the nature of the amino acid alphabet (17)(18)(19)(20). One of the questions that mostly attracted the attention of the scientific community was about the universality of the 20 letters. ...
Preprint
Full-text available
Understanding the origin of the 20 letter alphabet of proteins is a long-lasting biophysical problem. In particular, studies focused extensively on the effect of a reduced alphabet size on the folding properties. However, the natural alphabet is a compromise between versatility and optimisation of the available resources. Here, for the first time, we include the additional impact of the relative availability of the amino acids. We present a protein design scheme that involves the competition for resources between a protein and a potential interaction partner that, additionally, gives us the chance to investigate the effect of the reduced alphabet on protein-protein interactions. We identify the optimal reduced set of letters for the design of the protein, and we observe that even alphabets reduced down to 4 letters allow for single protein folding. However, it is only with 6 letters that we achieve optimal folding, thus recovering experimental observations. Additionally, we notice that the binding between the protein and a potential interaction partner could not be avoided with the investigated reduced alphabets. Therefore, we suggest that aggregation could have been a driving force for the evolution of the large protein alphabet.
... En particulier, des méthodes mettant à profit les liens étroits existants entre la fonction et la structure d'une protéine ont été développées pour pré-filtrer les combinaisons de mutations les plus pertinentes vis à vis des propriétés recherchées et ainsi réduire les tests expérimentaux à un petit nombre de protéines. Malgré les progrès réalisés dans le domaine du Design Computationnel de Protéines [1][2][3][4][5][6][7], de nouveaux algorithmes, méthodes et outils computationnels sont nécessaires pour évaluer plus précisément l'impact des mutations sur la propriété recherchée afin d'améliorer ainsi la fiabilité des prédictions vis à vis des modifications moléculaires à apporter au sein des protéines. De telles avancées sont essentielles pour accélérer la conception de protéines ou d'assemblages (macro)moléculaires adaptés à leur utilisation dans des procédés durables et compétitifs d'intérêt pour tout un spectre de secteurs (mé-1 dical, cosmétique, environnemental, bio-énergie, agroalimentaire, pharmaceutique, etc.). ...
Thesis
Cette thèse porte sur deux sujets intrinsèquement liés : le calcul de la constante de normalisation d’un champ de Markov et l’estimation de l’affinité de liaison d’un complexe de protéines. Premièrement, afin d’aborder ce problème de comptage #P complet, nous avons développé Z*, basé sur un élagage des quantités de potentiels négligeables. Il s’est montré plus performant que des méthodes de l’état de l’art sur des instances issues d’interaction protéine-protéine. Par la suite, nous avons développé #HBFS, un algorithme avec une garantie anytime, qui s’est révélé plus performant que son prédécesseur. Enfin, nous avons développé BTDZ, un algorithme exact basé sur une décomposition arborescente qui a fait ses preuves sur des instances issues d’interaction intermoléculaire appelées “superhélices”. Ces algorithmes s’appuient sur des méthodes issuse des modèles graphiques : cohérences locales, élimination de variable et décompositions arborescentes. A l’aide de méthodes d’optimisation existantes, de Z* et des fonctions d’énergie de Rosetta, nous avons développé un logiciel open source estimant la constante d’affinité d’un complexe protéine protéine sur une librairie de mutants. Nous avons analysé nos estimations sur un jeu de données de complexes de protéines et nous les avons confronté à deux approches de l’état de l’art. Il en est ressorti que notre outil était qualitativement meilleur que ces méthodes.
... Both have individually been approached successfully by a number of studies. [35][36][37] We have developed a computational protein engineering approach that combines the two, consisting of a method for the elaboration of existing protein domains containing functional active sites by introducing new ligandspecific binding/catalytic groups at the domain-domain interface, while using variable linker segments as guiding connectors for the inserted domain. In this study, we describe an in silico benchmark in which we successfully recapitulate the sequences and domain placements of a/b hydrolase insert-domain protein complexes, as well as domain placements for a previously described inserted-domain benchmark (Gray set 18 ). ...
Article
The computational design of novel nested proteins – in which the primary structure of one protein domain (insert) is flanked by the primary structure segments of another (parent) – would enable the generation of multifunctional proteins. Here we present a new algorithm, called Loop-Directed Domain Insertion (LooDo), implemented within the Rosetta software suite, for the purpose of designing nested protein domain combinations connected by flexible linker regions. Conformational space for the insert domain is sampled using large libraries of linker fragments for linker-to-parent domain superimposition followed by insert-to-linker superimposition. The relative positioning of the two domains (treated as rigid bodies) is sampled efficiently by a grid-based, mutual placement compatibility search. The conformations of the loop residues, and the identities of loop as well as interface residues, are simultaneously optimized using a generalized kinematic loop closure algorithm and Rosetta EnzymeDesign, respectively, to minimize interface energy. The algorithm was found to consistently sample near-native conformations and interface sequences for a benchmark set of structurally similar but functionally divergent domain-inserted enzymes from the α/β hydrolase superfamily, and discriminates well between native and nonnative conformations and sequences, although loop conformations tended to deviate from the native conformations. Furthermore, in cross-domain placement tests, native insert-parent domain combinations were ranked as the best-scoring structures compared to nonnative domain combinations. This algorithm should be broadly applicable to the design of multi-domain protein complexes with any combination of inserted or tandem domain connections. This article is protected by copyright. All rights reserved.
... A strategy that is currently applied on-and off-lattice to design problematic structures or just to improve the stability of the artificial sequences is the so called 'Negative Design' method. The method consists of minimizing the energy of the target structure with respect to the free energy of all the other compact configurations, which is estimated using a set of decoy configurations [13,18,29,33,75,[77][78][79][80][81][82][83][84][85][86][87] ...
Article
Full-text available
Proteins are one of the most versatile modular assembling systems in nature. Experimentally, more than 110,000 protein structures have been identified and more are deposited every day in the Protein Data Bank. Such an enormous structural variety is to a first approximation controlled by the sequence of amino acids along the peptide chain of each protein. Understanding how the structural and functional properties of the target can be encoded in this sequence is the main objective of protein design. Unfortunately, rational protein design remains one of the major challenges across the disciplines of biology, physics and chemistry. The implications of solving this problem are enormous and branch into materials science, drug design, evolution and even cryptography. For instance, in the field of drug design an effective computational method to design protein-based ligands for biological targets such as viruses, bacteria or tumour cells, could give a significant boost to the development of new therapies with reduced side effects. In materials science, self-assembly is a highly desired property and soon artificial proteins could represent a new class of designable self-assembling materials. The scope of this review is to describe the state of the art in computational protein design methods and give the reader an outline of what developments could be expected in the near future.
... With the rapid development of computational biology, many theoretical simulation methods, such as Rosetta, Z-Dock, GRAMM-X, and other commercially available programs, have been established as effective tools to guide experimental design for protein self-assembly with atomic resolution. 47,48 These methods provide a wide variety of sampling algorithms to search the conformational space of proteins and scoring functions to deduce the possible binding affinities, 49,50 which opens an alternative way to develop protein assemblies by redesigning the recognition specificity of existing interfaces to manipulate PPIs in order to obtain a targeted superstructure. 51 The accuracies of these predictions strongly depend on the quality of the energy function and the adequate sampling of conformations. ...
Article
Nature endows life with a wide variety of sophisticated, synergistic, and highly functional protein assemblies. Following Nature's inspiration to assemble protein building blocks into exquisite nanostructures is emerging as a fascinating research field. Dictating protein assembly to obtain highly ordered nanostructures and sophisticated functions not only provides a powerful tool to understand the natural protein assembly process but also offers access to advanced biomaterials. Over the past couple of decades, the field of protein assembly has undergone unexpected and rapid developments, and various innovative strategies have been proposed. This Review outlines recent advances in the field of protein assembly and summarizes several strategies, including biotechnological strategies, chemical strategies, and combinations of these approaches, for manipulating proteins to self-assemble into desired nanostructures. The emergent applications of protein assemblies as versatile platforms to design a wide variety of attractive functional materials with improved performances have also been discussed. The goal of this Review is to highlight the importance of this highly interdisciplinary field and to promote its growth in a diverse variety of research fields ranging from nanoscience and material science to synthetic biology.
... Moreover, characterizing these interactions at the atomic level can help in rationally designing new therapeutic agents that can either enhance or inhibit these interactions. Constructing a three dimensional structure of such protein complexes is an essential step toward identifying their binding interface and recognizing any hot spots that can be targeted for their regulation (Elcock et al. 2001;Kann, 2007;Kortemme & Baker, 2004). ...
Article
Full-text available
Controllable protein nanoarchitectonics refers to the process of manipulating and controlling the assembly of proteins at the nanoscale to achieve domain‐limited and accurate spatial arrangement. In nature, many proteins undergo precise self‐assembly with other structural domains to engage in synergistic physiological activities. Protein nanomaterials prepared through protein nanosizing have received considerable attention due to their excellent biocompatibility, low toxicity, modifiability, and versatility. This review focuses on the fundamental strategies used for controllable protein nanoarchitectinics, which include computational design, self‐assembly induction, template introduction, complexation induction, chemical modification, and in vivo assembly. Precise controlling of the nanosizing process has enabled the creation of protein nanostructures with different dimensions, including 0D spherical oligomers, 1D nanowires, nanorings, and nanotubes, as well as 2D nanofilms, and 3D protein nanocages. The unique biological properties of proteins hold promise for diverse applications of these protein nanomaterials, including in biomedicine, the food industry, agriculture, biosensing, environmental protection, biocatalysis, and artificial light harvesting. Protein nanosizing is a powerful tool for developing biomaterials with advanced structures and functions.
Article
The ERM (ezrin, radixin, and moesin) family of proteins and the related protein merlin participate in scaffolding and signaling events at the cell cortex. The proteins share an N-terminal FERM [band four-point-one (4.1) ERM] domain composed of three subdomains (F1, F2, and F3) with binding sites for short linear peptide motifs. By screening the FERM domains of the ERMs and merlin against a phage library that displays peptides representing the intrinsically disordered regions of the human proteome, we identified a large number of novel ligands. We determined the affinities for the ERM and merlin FERM domains interacting with 18 peptides and validated interactions with full-length proteins through pull-down experiments. The majority of the peptides contained an apparent Yx[FILV] motif; others show alternative motifs. We defined distinct binding sites for two types of similar but distinct binding motifs (YxV and FYDF) using a combination of Rosetta FlexPepDock computational peptide docking protocols and mutational analysis. We provide a detailed molecular understanding of how the two types of peptides with distinct motifs bind to different sites on the moesin FERM phosphotyrosine binding-like subdomain and uncover interdependencies between the different types of ligands. The study expands the motif-based interactomes of the ERMs and merlin and suggests that the FERM domain acts as a switchable interaction hub.
Article
In this study, we constructed a semiartificial protein assembly of alternating ring type, which was modified from the natural assembly state via incorporation of a synthetic component at the protein interface. For the redesign of a natural protein assembly, a scrap-and-build approach employing chemical modification was used. Two different protein dimer units were designed based on peroxiredoxin from Thermococcus kodakaraensis, which originally forms a dodecameric hexagonal ring with six homodimers. The two dimeric mutants were reorganized into a ring by reconstructing the protein-protein interactions via synthetic naphthalene moieties introduced by chemical modification. Cryo-electron microscopy revealed the formation of a uniquely shaped dodecameric hexagonal protein ring with broken symmetry, distorted from the regular hexagon of the wild-type protein. The artificially installed naphthalene moieties were arranged at the interfaces of dimer units, forming two distinct protein-protein interactions, one of which is highly unnatural. This study deciphered the potential of the chemical modification technique that constructs semiartificial protein structures and assembly hardly accessible by conventional amino acid mutations.
Article
Full-text available
Protein assembly is the structural basis for the collaboration between proteins to accomplish life activities due to its realization of the domain‐limited and precise spatial arrangement of proteins. Therefore, artificial manipulation of protein self‐assembly has profound implications in areas such as exploring the mysteries of life and developing biomaterials. In this review, we not only summarize the classical assembly strategies and the structures of exquisite protein assemblies, but also aim to generalize the flexible and controllable assembly tools and the “interaction” between the assembled protein‐based materials and the external environment. On the basis of this, the application, challenges and further development of protein assembly in biology are reviewed. Artificial control of protein self‐assembly can help to explain the information exchange and cooperation between proteins, and opens the way for the development of novel biomaterials. In this paper, the classical protein assembly construction tools and the derived controllable design strategies are summarized and analyzed. In addition, the applications of protein assemblies in biology are also reviewed.
Article
Metalloenzymes play essential roles in biology, whereas artificial metalloenzymes use synthetic metal cofactors for promoting non-natural reactions. In the past decades, tremendous advances have been made in manipulating artificial metalloenzymes for various organic transformation reactions, including C–H activation, C–C coupling, transfer hydrogenation, etc. Advanced methods like “Directed evolution,” “high throughput screening,” and “rational design” have stimulated the artificial metalloenzyme research. Applications of artificial metalloenzymes have been extended to cells for controlling functions like prodrug activation. Usually, for more complicated processes like multistep reactions or isolation of reaction environments, nature uses sophisticated strategies, such as positional assembly and compartmentalization of catalysts. However, artificial metalloenzyme research in this direction is relatively less. Several researchers have designed and constructed various protein assembly structures through metal coordination. However, only a few of them have been tested for catalytic activities. Assembled metalloenzymes have multiple advantages like promoting multistep reactions, stabilizing the catalyst, cooperativity in the reaction, higher-order complexity, sophisticated structures, confinement of reaction, etc. Therefore, systematic investigations on their design, structure, and activity are necessary to represent them as next-generation biocatalysts. In this context, the current review highlights the importance of self-assembled metalloenzymes, available design strategies, current developments, catalytic activities, and the future direction of the research.
Article
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein-protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker’s yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.
Article
Full-text available
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins’ most remarkable feature is their modularity. The large amount of information required to specify each protein’s function is analogically encoded with an alphabet of just ~20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Article
Full-text available
Aptamers are short oligonucleotides (DNA/RNA) or peptide molecules that can selectively bind to their specific targets with high specificity and affinity. As a powerful new class of amino acid ligands, aptamers have high potentials in biosensing, therapeutic, and diagnostic fields. Here, we present AptaNet—a new deep neural network—to predict the aptamer–protein interaction pairs by integrating features derived from both aptamers and the target proteins. Aptamers were encoded by using two different strategies, including k-mer and reverse complement k-mer frequency. Amino acid composition (AAC) and pseudo amino acid composition (PseAAC) were applied to represent target information using 24 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied a neighborhood cleaning algorithm. The predictor was constructed based on a deep neural network, and optimal features were selected using the random forest algorithm. As a result, 99.79% accuracy was achieved for the training dataset, and 91.38% accuracy was obtained for the testing dataset. AptaNet achieved high performance on our constructed aptamer-protein benchmark dataset. The results indicate that AptaNet can help identify novel aptamer–protein interacting pairs and build more-efficient insights into the relationship between aptamers and proteins. Our benchmark dataset and the source codes for AptaNet are available in: https://github.com/nedaemami/AptaNet.
Chapter
Protein-protein docking algorithms are powerful computational tools, capable of analyzing the protein-protein interactions at the atomic-level. In this chapter, we will review the theoretical concepts behind different protein-protein docking algorithms, highlighting their strengths as well as their limitations and pointing to important case studies for each method. The methods we intend to cover in this chapter include various search strategies and scoring techniques. This includes exhaustive global search, fast Fourier transform search, spherical Fourier transform-based search, direct search in Cartesian space, local shape feature matching, geometric hashing, genetic algorithm, randomized search, and Monte Carlo search. We will also discuss the different ways that have been used to incorporate protein flexibility within the docking procedure and some other future directions in this field, suggesting possible ways to improve the different methods.
Chapter
Protein-protein docking algorithms are powerful computational tools, capable of analyzing the protein-protein interactions at the atomic-level. In this chapter, we will review the theoretical concepts behind different protein-protein docking algorithms, highlighting their strengths as well as their limitations and pointing to important case studies for each method. The methods we intend to cover in this chapter include various search strategies and scoring techniques. This includes exhaustive global search, fast Fourier transform search, spherical Fourier transform-based search, direct search in Cartesian space, local shape feature matching, geometric hashing, genetic algorithm, randomized search, and Monte Carlo search. We will also discuss the different ways that have been used to incorporate protein flexibility within the docking procedure and some other future directions in this field, suggesting possible ways to improve the different methods.
Article
We recently introduced protein-metal-organic frameworks (protein-MOFs) as chemically designed protein crystals, composed of ferritin nodes that predictably assemble into 3D lattices upon coordination of various metal ions and ditopic, hydroxamate-based linkers. Owing to their unique tripartite construction, protein-MOFs possess extremely sparse lattice connectivity, suggesting that they might display unusual thermomechanical properties. Leveraging the synthetic modularity of ferritin-MOFs, we investigated the temperature-dependent structural dynamics of six distinct frameworks. Our results show that the thermostabilities of ferritin-MOFs can be tuned through the metal component or the presence of crowding agents. Our studies also reveal a framework that undergoes a reversible and isotropic first-order phase transition near-room temperature, corresponding to a 4% volumetric change within ≤1 deg C and a hysteresis window of ~10 deg C. This highly cooperative crystal-to-crystal transformation, which stems from the soft crystallinity of ferritin-MOFs, illustrates the advantage of modular con-struction strategies in discovering tunable-and unpredictable-material properties.
Article
Protein engineering is an attractive approach for the self-assembly of nanometre-scale architectures for a range of potential nanotechnologies. Using the versatile chemistry provided by protein folding and assembly, coupled with amino acid side-chain functionality, allows for the construction of precise molecular “protein origami” hierarchical patterned structures for a range of nano-applications such as stand-alone enzymatic pathways and molecular machines. The Staphyloccocus aureus surface protein SasG is a rigid, rod-like structure shown to have high mechanical strength due to ‘clamp-like’ intra-domain features and a stabilising interface between G5 and E domains, making it an excellent building block for molecular self-assembly. Here we characterise a new two sub-unit system composed of the SasG rod protein genetically conjugated with \emph{de novo} designed coiled-coils, resulting in the self-assembly of fibrils. Circular dichroism (CD) and quartz-crystal microbalance with dissipation (QCM-D) are used to show the specific, alternating, binding between the two subunits. Furthermore, we use atomic force microscopy (AFM) to study the extent of subunit polymerisation in a liquid environment, demonstrating self-assembly culminating in formation of linear macromolecular fibrils.
Article
Proteins are a class of nanoscale building block with remarkable chemical complexity and sophistication: their diverse functions, shapes, and symmetry as well as atomically monodisperse structures far surpass the range of conventional nanoparticles that can be accessed synthetically. The chemical topologies of proteins that drive their assembly into materials are central to their functions in nature. However, despite the importance of protein materials in biology, efforts to harness these building blocks synthetically to engineer new materials have been impeded by the chemical complexity of protein surfaces, making it difficult to reliably design protein building blocks that can be robustly transformed into targeted materials. Here we describe our work aimed at exploiting a simple but important concept: if one could exchange complex protein-protein interactions with well-defined and programmable DNA-DNA interactions, one could control the assembly of proteins into structurally well-defined oligomeric and polymeric materials and three-dimensional crystals. As a class of nanoscale building block, proteins with surface DNA modifications have a vast design space that exceeds what is practically and conceptually possible with their inorganic counterparts: the sequences of the DNA and protein and the chemical nature and position of DNA attachment all play roles in dictating the assembly behavior of protein-DNA conjugates. We summarize how each of these design parameters can influence structural outcome, beginning with proteins with a single surface DNA modification, where energy barriers between protein monomers can be tuned through the sequence and secondary structure of the oligonucleotide. We then explore challenges and progress in designing directional interactions and valency on protein surfaces. The directional binding properties of protein-DNA conjugates are ultimately imposed by the amino acid sequence of the protein, which defines the spatial distribution of DNA modification sites on the protein. Through careful design and mutagenesis, bivalent building blocks that bind directionally to form one-dimensional assemblies can be realized. Finally, we discuss the assembly of proteins densely modified with DNA into crystalline superlattices. At first glance, these protein building blocks display crystallization behavior remarkably similar to that of their DNA-functionalized inorganic nanoparticle counterparts, which allows design principles elucidated for DNA-guided nanoparticle crystallization to be used as predictive tools in determining structural outcomes in protein systems. Proteins additionally offer design handles that nanoparticles do not: unlike nanoparticles, the number and spatial distribution of DNA can be controlled through the protein sequence and DNA modification chemistry. Changing the spatial distributions of DNA can drive otherwise identical proteins down distinct crystallization pathways and yield building blocks with exotic distributions of DNA that crystallize into structures that are not yet attainable using isotropically functionalized particles. We highlight challenges in accessing other classes of architectures and establishing general design rules for DNA-mediated protein assembly. Harnessing surface DNA modifications to build protein materials creates many opportunities to realize new architectures and answer fundamental questions about DNA-modified nanostructures in both materials and biological contexts. Proteins with surface DNA modifications are a powerful class of nanomaterial building blocks for which the DNA and protein sequences and the nature of their conjugation dictate the material structure.
Article
Molecular recognition is a critical process for many biological functions and consists in non-covalent binding of different molecules, such as protein-protein, antigen-antibody and many others. The host-guest molecules involved often show a shape complementarity, and one of the leading specification for molecular recognition is that the interaction must be selective, i.e. the host should strongly bind to one selected guest and poorly, if at all, to all other biomolecules. Our work focuses on the role played by the chemical heterogeneity and the steric compatibility on the selectivity power of the binding site between two proteins. We tackle the problem computationally, reducing the complexity of the system by simulating a protein and a surface-like element, that shapes part of the protein and represents the binding site of an interaction partner. We investigate four systems, differing in terms of binding site size. A significant result is that, despite the fact that protein and surface chemical sequences are interdependent and simultaneously generated to stabilise the bound folded structure, the protein is stable in the folded conformation even in the absence of the surface-like partner for all investigated systems. We observe that an increase of the surface area results in a significant increase of the binding affinity. Interestingly, our data suggest the presence of upper and lower limits for the maximum and minimum area size available for a binding site. Our data match the experimental observation of such limits (750 -1500~Ų), and provide a rationale for them: the extent of the binding site area is limited by the value of the binding constant. For large contact areas, at physiological conditions, the binding is orders of magnitude stronger (Ka > 10⁴⁰ l/mol) that what typically observed in natural biological processes. Conversely, the smallest surface tested is just the minimal size to allow for selective binding.
Book
"Molecular Materials with Specific Interactions: Modeling and Design" has a very interdisciplinary character and is intended to provide basic information as well as the details of theory and examples of its application to experimentalists and theoreticians interested in modeling molecular properties and putting into practice rational design of new materials. One of the first requirements to initiate the molecular modeling of molecular materials is an accurate and realistic description of the electronic structure, intermolecular interactions and chemical reactions at microscopic and macroscopic scale. Therefore the first four chapters contain an extensive introduction into the latest theories of intermolecular interactions, functional density techniques, microscopic and mezoscopic modeling techniques as well as first-principle molecular dynamics. In the following chapters, techniques bridging microscopic and mezoscopic modeling scales are presented. The authors then illustrate various successful applications of molecular design of new materials, drugs, biocatalysts, etc. before presenting challenging topics in molecular materials design. This book is an excellent source of information for professionals involved in research in computational chemistry and physics, material science, nanotechnology, rational drug design and molecular biology. It will benefit graduates, as well as undergraduate students exposed to the above research areas.
Chapter
The emergence of bioinformatics has enriched the different dimensions of biological research. A sizable quantum of complex biological data can now be encapsulated into a more palatable form particularly in the context of human health. We provide a critical overview on such bioinformatic databases and softwares that enable a deeper insight into the human genome and proteome.
Article
Sophisticated protein self-assemblies have attracted great scientific interests in recent few decades due to their various potential applications in substance/signal transmission, biosensors, or disease diagnosis and treatment. The design and construction of proteins into hierarchical nanostructures via self-assembly strategies offer unique advantages in understanding the mechanism of naturally occurring protein assemblies and/or creating various functional biomaterials with advanced properties. This review covers the recent progress and trends in the self-assembled hierarchical protein structures and their bio-inspired applications. We initially discuss the design and development of sophisticated protein nanostructures through the preciously designed protein–protein interactions. Many intricate protein nanostructures from quasi-zero dimensional (0D) polyhedral cages, one-dimensional (1D) strings/rings/tubules, two-dimensional (2D) crystal sheets/cambered surfaces, and three-dimensional (3D) crystalline frameworks/hydrogels, have been constructed through self-assembly of rationally designed proteins. In addition, we also show the representative achievements in the study of the structure–function relationship for selected protein self-assemblies and highlight the latest research progress in developing artificial light harvesting systems, biological nanoenzyme mimics, intelligent protein nanocarriers, biomimetic protocells, and so on. As expected, protein self-assembly has become a powerful tool for development of multifarious bioinspired materials with advanced structures and properties.
Chapter
Directed evolution (DE) creates diversity in subsequent rounds of mutagenesis in the quest of increased protein stability, substrate binding, and catalysis. Although this technique does not require any structural/mechanistic knowledge of the system, the frequency of improved mutations is usually low. For this reason, computational tools are increasingly used to focus the search in sequence space, enhancing the efficiency of laboratory evolution. In particular, molecular modeling methods provide a unique tool to grasp the sequence/structure/function relationship of the protein to evolve, with the only condition that a structural model is provided. With this book chapter, we tried to guide the reader through the state of the art of molecular modeling, discussing their strengths, limitations, and directions. In addition, we suggest a possible future template for in silico directed evolution where we underline two main points: a hierarchical computational protocol combining several different techniques and a synergic effort between simulations and experimental validation.
Article
Full-text available
Homologs of the Escherichia coli surE gene are present in many eubacteria and archaea. Despite the evolutionary conservation, little information is available on the structure and function of their gene products. We have determined the crystal structure of the SurE protein from Thermotoga maritima. The structure reveals the dimeric arrangement of the subunits and an active site around a bound metal ion. We also demonstrate that the SurE protein exhibits a divalent metal ion-dependent phosphatase activity that is inhibited by vanadate or tungstate. In the vanadate- and tungstate-complexed structures, the inhibitors bind adjacent to the divalent metal ion. Our structural and functional analyses identify the SurE proteins as a novel family of metal ion-dependent phosphatases.
Article
Full-text available
A series of synthetic receptors capable of binding to the calmodulin-binding domain of calcineurin (CN393-414) was designed, synthesized and characterized. The design was accomplished by docking CN393-414 against a two-helix receptor, using an idealized three-stranded coiled coil as a starting geometry. The sequence of the receptor was chosen using a side-chain re-packing program, which employed a genetic algorithm to select potential binders from a total of 7.5x10(6) possible sequences. A total of 25 receptors were prepared, representing 13 sequences predicted by the algorithm as well as 12 related sequences that were not predicted. The receptors were characterized by CD spectroscopy, analytical ultracentrifugation, and binding assays. The receptors predicted by the algorithm bound CN393-414 with apparent dissociation constants ranging from 0.2 microM to >50 microM. Many of the receptors that were not predicted by the algorithm also bound to CN393-414. Methods to circumvent this problem and to improve the automated design of functional proteins are discussed.
Article
Full-text available
The computer-aided design of protein sequences requires efficient search algorithms to handle the enormous combinatorial complexity involved. A variety of different algorithms have now been applied with some success. The choice of algorithm can influence the representation of the problem in several important ways--the discreteness of the configuration, the types of energy terms that can be used and the ability to find the global minimum energy configuration. The use of dead end elimination to design the complete sequence for a small protein motif and the use of genetic and mean-field algorithms to design hydrophobic cores for proteins represent the major themes of the past year.
Article
Full-text available
Recent advances in computational techniques have allowed the design of precise side-chain packing in proteins with predetermined, naturally occurring backbone structures. Because these methods do not model protein main-chain flexibility, they lack the breadth to explore novel backbone conformations. Here the de novo design of a family of α-helical bundle proteins with a right-handed superhelical twist is described. In the design, the overall protein fold was specified by hydrophobic-polar residue patterning, whereas the bundle oligomerization state, detailed main-chain conformation, and interior side-chain rotamers were engineered by computational enumerations of packing in alternate backbone structures. Main-chain flexibility was incorporated through an algebraic parameterization of the backbone. The designed peptides form α-helical dimers, trimers, and tetramers in accord with the design goals. The crystal structure of the tetramer matches the designed structure in atomic detail.
Article
Full-text available
A protein design strategy was developed to specifically enhance the rate of association (k(on)) between a pair of proteins without affecting the rate of dissociation (k(off)). The method is based on increasing the electrostatic attraction between the proteins by incorporating charged residues in the vicinity of the binding interface. The contribution of mutations towards the rate of association was calculated using a newly developed computer algorithm, which predicted accurately the rate of association of mutant protein complexes relative to the wild type. Using this design strategy, the rate of association and the affinity between TEM1 beta-lactamase and its protein inhibitor BLIP was enhanced 250-fold, while the dissociation rate constant was unchanged. The results emphasize that long range electrostatic forces specifically alter k(on), but do not effect k(off). The design strategy presented here is applicable for increasing rates of association and affinities of protein complexes in general.
Article
Full-text available
We report the development and initial experimental validation of a computational design procedure aimed at generating enzyme-like protein catalysts called "protozymes." Our design approach utilizes a "compute and build" strategy that is based on the physical/chemical principles governing protein stability and catalytic mechanism. By using the catalytically inert 108-residue Escherichia coli thioredoxin as a scaffold, the histidine-mediated nucleophilic hydrolysis of p-nitrophenyl acetate as a model reaction, and the ORBIT protein design software to compute sequences, an active site scan identified two promising catalytic positions and surrounding active-site mutations required for substrate binding. Experimentally, both candidate protozymes demonstrated catalytic activity significantly above background. One of the proteins, PZD2, displayed "burst" phase kinetics at high substrate concentrations, consistent with the formation of a stable enzyme intermediate. The kinetic parameters of PZD2 are comparable to early catalytic Abs. But, unlike catalytic Ab design, our design procedure is independent of fold, suggesting a possible mechanism for examining the relationships between protein fold and the evolvability of protein function.
Article
Full-text available
The rational design of loops and turns is a key step towards creating proteins with new functions. We used a computational design procedure to create new backbone conformations in the second turn of protein L. The Protein Data Bank was searched for alternative turn conformations, and sequences optimal for these turns in the context of protein L were identified using a Monte Carlo search procedure and an energy function that favors close packing. Two variants containing 12 and 14 mutations were found to be as stable as wild-type protein L. The crystal structure of one of the variants has been solved at a resolution of 1.9 A, and the backbone conformation in the second turn is remarkably close to that of the in silico model (1.1 A RMSD) while it differs significantly from that of wild-type protein L (the turn residues are displaced by an average of 7.2 A). The folding rates of the redesigned proteins are greater than that of the wild-type protein and in contrast to wild-type protein L the second beta-turn appears to be formed at the rate limiting step in folding.
Article
Full-text available
Protein-protein interactions are central to most biological processes. Although much recent effort has been put into methods to identify interacting partners, there has been a limited focus on how these interactions compare with those known from three-dimensional (3D) structures. Because comparison of protein interactions often involves considering homologous, but not identical, proteins, a key issue is whether proteins that are homologous to an interacting pair will interact in the same way, or interact at all. Accordingly, we describe a method to test putative interactions on complexes of known 3D structure. Given a 3D complex and alignments of homologues of the interacting proteins, we assess the fit of any possible interacting pair on the complex by using empirical potentials. For studies of interacting protein families that show different specificities, the method provides a ranking of interacting pairs useful for prioritizing experiments. We evaluate the method on interacting families of proteins with multiple complex structures. We then consider the fibroblast growth factor/receptor system and explore the intersection between complexes of known structure and interactions proposed between yeast proteins by methods such as two-hybrids. We provide confirmation for several interactions, in addition to suggesting molecular details of how they occur.
Article
Full-text available
We have designed de novo 13 divergent spectrin SH3 core sequences to determine their folding properties. Kinetic analysis of the variants with stability similar to that of the wild type protein shows accelerated unfolding and refolding rates compatible with a preferential stabilization of the transition state. This is most likely caused by conformational strain in the native state, as deletion of a methyl group (Ile-->Val) leads to deceleration in unfolding and increased stability (up to 2 kcal x mol(-1)). Several of these Ile-->Val mutants have negative phi(-U) values, indicating that some noncanonical phi(-U) values might result from conformational strain. Thus, producing a stable protein does not necessarily mean that the design process has been entirely successful. Strained interactions could have been introduced, and a reduction in the buried volume could result in a large increase in stability and a reduction in unfolding rates.
Article
Full-text available
PDZ domains are small globular domains that recognize the last 4-7 amino acids at the C-terminus of target proteins. The specificity of the PDZ-ligand recognition is due to side chain-side chain interactions, as well as the positioning of an alpha-helix involved in ligand binding. We have used computer-aided protein design to produce mutant versions of a Class I PDZ domain that bind to novel Class I and Class II target sequences both in vitro and in vivo, thus providing an alternative to primary antibodies in western blotting, affinity chromatography and pull-down experiments. Our results suggest that by combining different backbone templates with computer-aided protein design, PDZ domains could be engineered to specifically recognize a large number of proteins.
Article
The change in free energy of binding of hen egg white lysozyme (HEL) to the antibody HyHel-10 arising from ten point mutations in HEL (D101K, D101G, K96M, K97D, K97G, K97G, R21E, R21K, W62Y, and W63Y) was calculated using a combination of the finite difference Poisson-Boltzmann method for the electrostatic contribution, a solvent accessible surface area term for the non-polar contribution, and rotamer counting for the sidechain entropy contribution. Comparison of experimental and calculated results indicate that because of pKa shifts in some of the mutated residues, primarily those involving Aspartate or Glutamate, proton uptake or release occurs in binding. When this effect was incorporated into the binding free energy calculations, the agreement with experiment improved significantly, and resulted in a mean error of about 1.9 kcal/mole. Thus these calculations predict that there should be a significant pH dependence to the change in binding caused by these mutations. The other major contributions to binding energy changes comes from solvation and charge charge interactions, which tend to oppose each other. Smaller contributions come from nonpolar interactions and sidechain entropy changes. The structures of the HyHel-10-HEL complexes with mutant HEL were obtained by modeling, and the effect of the modeled structure on the calculations was also examined. “Knowledge based” modeling and automatic generation of models using molecular mechanics produced comparable results. Proteins 33:39–48, 1998. © 1998 Wiley-Liss, Inc.
Article
The conformations of proteins and protein−protein complexes observed in nature must be low in free energy relative to alternative (not observed) conformations, and it is plausible (but not absolutely necessary) that the electrostatic free energies of experimentally observed conformations are also low relative to other conformations. Starting from this assumption, we evaluate alternative models of electrostatic interactions in proteins by comparing the electrostatic free energies of native, nativelike, and non-native structures. We observe that the total electrostatic free energy computed using the Poisson−Boltzmann (PB) equation or the generalized Born (GB) model exhibits free energy gaps that are comparable to, or smaller than, the free energy gaps resulting from Coulomb interactions alone. Detailed characterization of the contributions of different atom types to the total electrostatic free energy showed that, although for most atoms unfavorable solvation energies associated with atom burial are more than compensated by attractive Coulomb interactions, Coulomb interactions do not become more favorable with burial for certain backbone atom types, suggesting inaccuracies in the treatment of backbone electrostatics. Sizable free energy gaps are obtained using simple distance-dependent dielectric models, suggesting their usefulness in approximating the attenuation of long range Coulomb interactions by induced polarization effects. Hydrogen bonding interactions appear to be better modeled with an explicitly orientation-dependent hydrogen bonding potential than with any of the purely electrostatic models of hydrogen bonds, as there are larger free energy gaps with the former. Finally, a combined electrostatics−hydrogen bonding potential is developed that appears to better capture the free energy differences between native, nativelike, and non-native proteins and protein−protein complexes than electrostatic or hydrogen bonding models alone.
Article
Noncovalent interactions are important in many physiological processes of complexation which involve all components of the living cells. Here we report an approach to computationally study the interaction free energies in protein−protein complexes which allows from a single simulation an estimate of the individual contribution of each residue to the binding. We developed this new techniquecomputational alanine scanningand applied it to study the interactions of the oncoprotein Mdm2 to the N-terminal stretch of tumor suppressor protein p53. Excellent agreement has been found between the calculated and experimental data. This residue mutation methodology could prove to be a useful general design tool for moleculesnucleotides, peptides, lipids, or any other organic compoundoptimized for interactions or stability, since one can qualitatively estimate the free energy consequences of many mutations from a single molecular dynamics trajectory.
Article
We have developed a procedure to predict the peptide binding specificity of an SH3 domain from its sequence. The procedure utilizes information extracted from position-specific contacts derived from six SH3/peptide or SH3/protein complexes of known structure. The framework of SH3/peptide contacts defined on the structure of the complexes is used to build a residue-residue interaction database derived from ligands obtained by panning peptide libraries displayed on filamentous phage.The SH3-specific interaction database is a multidimensional array containing frequencies of position-specific contacts. As input, SH3-SPOT requires the sequence of an SH3 domain and of a query decapeptide ligand. The array, that we call the SH3-specific matrix, is then used to evaluate the probability that the peptide would bind the given SH3 domain. This procedure is fast enough to be applied to the entire protein sequence database.Panning experiments were performed to search putative specific ligands of different SH3 domains in a database of decapeptides, or in a database of protein sequences. The procedure ranked some of the natural partners of interaction of a number of SH3 domains among the best ligands of the ∼5.6 × 109 different decapeptides in the SWISSPROT database. We expect the predictive power of the method to increase with the enrichment of the SH3-specific matrix by interaction data derived from new complex structures or from the characterization of new ligands. The procedure was developed using the SH3 domain family as test case but its application can easily be extended to other families of protein domains (such as, SH2, MHC, EH, PDZ, etc.).
Article
Protein design has become a powerful approach for understanding the relationship between amino acid sequence and 3-dimensional structure. In the past 5 years, there have been many breakthroughs in the development of computational methods that allow the selection of novel sequences given the structure of a protein backbone. Successful design of protein scaffolds has now paved the way for new endeavors to design function. The ability to design sequences compatible with a fold may also be useful in structural and functional genomics by expanding the range of proteins used for fold recognition and for the identification of functionally important domains from multiple sequence alignments.
Article
In an accompanying paper a computational procedure is described, which introduces new ligand-binding sites into proteins of known structure. Here we describe the experimental implementation of one of the designs, which is intended to introduce a copper-binding site into Escherichia coli thioredoxin. The new binding site can be introduced with a minimum of four amino acid changes. The binding site is buried so that structural rules for making mutations in the hydrophobic core of a protein, as well as for the introduction of new functions, are being tested in this experiment. The mutant protein is folded even in the absence of metals, and variants that retain the original activity of thioredoxin can be isolated. The protein has gained a metal-binding site specific for transition metals. The metal co-ordination chemistry at the binding site varies depending on the metal that is introduced into it. Mercury(II) is co-ordinated in the expected manner. Copper(II) binds in a way that was not anticipated in the original design. It appears to use two of the four residues intended to form the co-ordination sphere, and two other residues that were not part of the original set of mutations. It is therefore necessary not only to introduce new functional groups to form a new site, but also to consider and remove alternative modes of binding.
Article
A major revival in the use of classical electrostatics as an approach to the study of charged and polar molecules in aqueous solution has been made possible through the development of fast numerical and computational methods to solve the Poisson-Boltzmann equation for solute molecules that have complex shapes and charge distributions. Graphical visualization of the calculated electrostatic potentials generated by proteins and nucleic acids has revealed insights into the role of electrostatic interactions in a wide range of biological phenomena. Classical electrostatics has also proved to be successful quantitative tool yielding accurate descriptions of electrical potentials, diffusion limited processes, pH-dependent properties of proteins, ionic strength-dependent phenomena, and the solvation free energies of organic molecules.
Article
We have developed and experimentally tested a novel computational approach for the de novo design of hydrophobic cores. A pair of computer programs has been written, the first of which creates a “custom” rotamer library for potential hydrophobic residues, based on the backbone structure of the protein of interest. The second program uses a genetic algorithm to globally optimize for a low energy core sequence and structure, using the custom rotamer library as input. Success of the programs in predicting the sequences of native proteins indicates that they should be effective tools for protein design. Using these programs, we have designed and engineered several variants of the phage 434 cro protein, containing five, seven, or eight sequence changes in the hydrophobic core. As controls, we have produced a variant consisting of a randomly generated core with six sequence changes but equal volume relative to the native core and a variant with a “minimalist” core containing predominantly leucine residues. Two of the designs, including one with eight core sequence changes, have thermal stabilities comparable to the native protein, whereas the third design and the minimalist protein are significantly destabilized. The randomly designed control is completely unfolded under equivalent conditions. These results suggest that rational de novo design of hydrophobic cores is feasible, and stress the importance of specific packing interactions for the stability of proteins. A surprising aspect of the results is that all of the variants display highly cooperative thermal denaturation curves and reasonably dispersed NMR spectra. This suggests that the non‐core residues of a protein play a significant role in determining the uniqueness of the folded structure.
Article
The first fully automated design and experimental validation of a novel sequence for an entire protein is described. A computational design algorithm based on physical chemical potential functions and stereochemical constraints was used to screen a combinatorial library of 1.9 × 1027 possible amino acid sequences for compatibility with the design target, a ββα protein motif based on the polypeptide backbone structure of a zinc finger domain. A BLAST search shows that the designed sequence, full sequence design 1 (FSD-1), has very low identity to any known protein sequence. The solution structure of FSD-1 was solved by nuclear magnetic resonance spectroscopy and indicates that FSD-1 forms a compact well-ordered structure, which is in excellent agreement with the design target structure. This result demonstrates that computational methods can perform the immense combinatorial search required for protein design, and it suggests that an unbiased and quantitative algorithm can be used in various structural contexts.
Article
Binding of one protein to another is involved in nearly all biological functions, yet the principles governing the interaction of proteins are not fully understood. To analyze the contributions of individual amino acid residues in protein-protein binding we have compiled a database of 2325 alanine mutants for which the change in free energy of binding upon mutation to alanine has been measured (available at http://motorhead. ucsf.edu/thorn/hotspot). Our analysis shows that at the level of side-chains there is little correlation between buried surface area and free energy of binding. We find that the free energy of binding is not evenly distributed across interfaces; instead, there are hot spots of binding energy made up of a small subset of residues in the dimer interface. These hot spots are enriched in tryptophan, tyrosine and arginine, and are surrounded by energetically less important residues that most likely serve to occlude bulk solvent from the hot spot. Occlusion of solvent is found to be a necessary condition for highly energetic interactions.
Article
The change in free energy of binding of hen egg white lysozyme (HEL) to the antibody HyHel-10 arising from ten point mutations in HEL (D101K, D101G, K96M, K97D, K97G, K97G, R21E, R21K, W62Y, and W63Y) was calculated using a combination of the finite difference Poisson-Boltzmann method for the electrostatic contribution, a solvent accessible surface area term for the non-polar contribution, and rotamer counting for the sidechain entropy contribution. Comparison of experimental and calculated results indicate that because of pKa shifts in some of the mutated residues, primarily those involving Aspartate or Glutamate, proton uptake or release occurs in binding. When this effect was incorporated into the binding free energy calculations, the agreement with experiment improved significantly, and resulted in a mean error of about 1.9 kcal/mole. Thus these calculations predict that there should be a significant pH dependence to the change in binding caused by these mutations. The other major contributions to binding energy changes comes from solvation and charge charge interactions, which tend to oppose each other. Smaller contributions come from nonpolar interactions and sidechain entropy changes. The structures of the HyHel-10-HEL complexes with mutant HEL were obtained by modeling, and the effect of the modeled structure on the calculations was also examined. "Knowledge based" modeling and automatic generation of models using molecular mechanics produced comparable results.
Article
The non-covalent assembly of proteins that fold separately is central to many biological processes, and differs from the permanent macromolecular assembly of protein subunits in oligomeric proteins. We performed an analysis of the atomic structure of the recognition sites seen in 75 protein-protein complexes of known three-dimensional structure: 24 protease-inhibitor, 19 antibody-antigen and 32 other complexes, including nine enzyme-inhibitor and 11 that are involved in signal transduction.The size of the recognition site is related to the conformational changes that occur upon association. Of the 75 complexes, 52 have "standard-size" interfaces in which the total area buried by the components in the recognition site is 1600 (+/-400) A2. In these complexes, association involves only small changes of conformation. Twenty complexes have "large" interfaces burying 2000 to 4660 A2, and large conformational changes are seen to occur in those cases where we can compare the structure of complexed and free components. The average interface has approximately the same non-polar character as the protein surface as a whole, and carries somewhat fewer charged groups. However, some interfaces are significantly more polar and others more non-polar than the average. Of the atoms that lose accessibility upon association, half make contacts across the interface and one-third become fully inaccessible to the solvent. In the latter case, the Voronoi volume was calculated and compared with that of atoms buried inside proteins. The ratio of the two volumes was 1.01 (+/-0.03) in all but 11 complexes, which shows that atoms buried at protein-protein interfaces are close-packed like the protein interior. This conclusion could be extended to the majority of interface atoms by including solvent positions determined in high-resolution X-ray structures in the calculation of Voronoi volumes. Thus, water molecules contribute to the close-packing of atoms that insure complementarity between the two protein surfaces, as well as providing polar interactions between the two proteins.
Article
We analyzed the atomic models of 75 X-ray structures of protein-nucleic acid complexes with the aim of uncovering common properties. The interface area measured the extent of contact between the protein and nucleic acid. It was found to vary between 1120 and 5800 A2. Despite this wide variation, the interfaces in complexes of transcription factors with double-stranded DNA could be broken up into recognition modules where 12 +/- 3 nucleotides on the DNA side contact 24 +/- 6 amino acids on the protein side, with interface areas in the range 1600 +/- 400 A2. For enzymes acting on DNA, the recognition module is on average 600 A2 larger, due to the requirement of making an active site. As judged by its chemical and amino acid composition, the average protein surface in contact with the DNA is more polar than the solvent accessible surface or the typical protein-protein interface. The protein side is rich in positively charged groups from lysine and arginine side chains; on the DNA side the negative charges from phosphate groups dominate. Hydrogen bonding patterns were also analyzed, and we found one intermolecular hydrogen bond per 125 A2 of interface area in high-resolution structures. An equivalent number of polar interactions involved water molecules, which are generally abundant at protein-DNA interfaces. Calculations of Voronoi atomic volumes, performed in the presence and absence of water molecules, showed that protein atoms buried at the interface with DNA are on average as closely packed as in the protein interior. Water molecules contribute to the close packing, thereby mediating shape complementarity. Finally, conformational changes accompanying association were analyzed in 24 of the complexes for which the structure of the free protein was also available. On the DNA side the extent of deformation showed some correlation with the size of the interface area. On the protein side the type and size of the structural changes spanned a wide spectrum. Disorder-to-order transitions, domain movements, quaternary and tertiary changes were observed, and the largest changes occurred in complexes with large interfaces.
Article
Conformational changes on complex formation have been measured for 39 pairs of structures of complexed proteins and unbound equivalents, averaged over interface and non-interface regions and for individual residues. We evaluate their significance by comparison with the differences seen in 12 pairs of independently solved structures of identical proteins, and find that just over half have some substantial overall movement. Movements involve main chains as well as side chains, and large changes in the interface are closely involved with complex formation, while those of exposed non-interface residues are caused by flexibility and disorder. Interface movements in enzymes are similar in extent to those of inhibitors. All eight of the complexes (six enzyme–inhibitor and two antibody–antigen) that have structures of both components in an unbound form available show some significant interface movement. However, predictive docking is successful even when some of the largest changes occur. We note however that the situation may be different in systems other than the enzyme–inhibitors which dominate this study. Thus the general model is induced fit but, because there is only limited conformational change in many systems, recognition can be treated as lock and key to a first approximation.
Article
A 'protein design cycle', involving cycling between theory and experiment, has led to recent advances in rational protein design. A reductionist approach, in which protein positions are classified by their local environments, has aided development of an appropriate energy expression. The computational principles and practicalities of the protein design cycle are discussed.
Article
We have developed a computational approach for the design and prediction of hydrophobic cores that includes explicit backbone flexibility. The program consists of a two-stage combination of a genetic algorithm and monte carlo sampling using a torsional model of the protein. Backbone structures are evaluated either by a canonical force-field or a constraining potential that emphasizes the preservation of local geometry. The utility of the method for protein design and engineering is explored by designing three novel hydrophobic core variants of the protein 434 cro. We use the new method to evaluate these and previously designed 434 cro variants, as well as a series of phage T4 lysozyme variants. In order to properly evaluate the influence of backbone flexibility, we have also analyzed the effects of varying amounts of side-chain flexibility on the performance of fixed backbone methods. Comparison of results using a fixed versus flexible backbone reveals that, surprisingly, the two methods are almost equivalent in their abilities to predict relative experimental stabilities, but only when full side-chain flexibility is allowed. The prediction of core side-chain structure can vary dramatically between methods. In some, but not all, cases the flexible backbone method is a better predictor of structure. The development of a flexible backbone approach to core design is particularly important for attempts at de novo protein design, where there is no prior knowledge of a precise backbone structure.
Article
In this work we describe the rational design of two helix coiled coil peptide mimetics of interleukin-4 (IL-4) which are able to recognize and bind its high affinity receptor (IL-4R alpha). We have used the leucine-zipper domain of the yeast transcription factor GCN4 as a scaffold into which the putative binding epitope of IL-4 for IL-4R alpha was transferred in a stepwise manner, using computer-aided molecular modeling. The resulting molecules bind IL-4R alpha with affinities ranging from 2 mM to 5 microM, depending on the fraction of the IL-4 binding site incorporated and on their stability. To our knowledge this is the first time a molecule capable of binding a cytokine receptor has been successfully designed in a rational manner.
Article
Recent successes in protein design have illustrated the promise of computational approaches. These methods rely on energy expressions to evaluate the quality of different amino acid sequences for target protein structures. The force fields optimized for design differ from those typically used in molecular mechanics and molecular dynamics calculations.
Article
Assumptions of restricted flexibility upon binding conflict with emerging data showing that motion can increase, decrease or stay the same within molecular complexes. Now, calculations of entropic contributions from dynamics at specific positions in a complex suggest that increases in motion can dominate the free energy of association in certain cases.
Article
Water molecules are found in abundance in protein-protein and protein-DNA interfaces. Although interface solvent molecules exchange quickly with the bulk solvent, structural and biochemical data suggest that water-mediated interactions are as important as direct hydrogen bonds in the stability and specificity of recognition.
Article
Finding the minimum energy amino acid side-chain conformation is a fundamental problem in both homology modeling and protein design. To address this issue, numerous computational algorithms have been proposed. However, there have been few quantitative comparisons between methods and there is very little general understanding of the types of problems that are appropriate for each algorithm. Here, we study four common search techniques: Monte Carlo (MC) and Monte Carlo plus quench (MCQ); genetic algorithms (GA); self-consistent mean field (SCMF); and dead-end elimination (DEE). Both SCMF and DEE are deterministic, and if DEE converges, it is guaranteed that its solution is the global minimum energy conformation (GMEC). This provides a means to compare the accuracy of SCMF and the stochastic methods. For the side-chain placement calculations, we find that DEE rapidly converges to the GMEC in all the test cases. The other algorithms converge on significantly incorrect solutions; the average fraction of incorrect rotamers for SCMF is 0.12, GA 0.09, and MCQ 0.05. For the protein design calculations, design positions are progressively added to the side-chain placement calculation until the time required for DEE diverges sharply. As the complexity of the problem increases, the accuracy of each method is determined so that the results can be extrapolated into the region where DEE is no longer tractable. We find that both SCMF and MCQ perform reasonably well on core calculations (fraction amino acids incorrect is SCMF 0.07, MCQ 0.04), but fail considerably on the boundary (SCMF 0.28, MCQ 0.32) and surface calculations (SCMF 0.37, MCQ 0.44).
Article
How large is the volume of sequence space that is compatible with a given protein structure? Starting from random sequences, low free energy sequences were generated for 108 protein backbone structures by using a Monte Carlo optimization procedure and a free energy function based primarily on Lennard-Jones packing interactions and the Lazaridis-Karplus implicit solvation model. Remarkably, in the designed sequences 51% of the core residues and 27% of all residues were identical to the amino acids in the corresponding positions in the native sequences. The lowest free energy sequences obtained for ensembles of native-like backbone structures were also similar to the native sequence. Furthermore, both the individual residue frequencies and the covariances between pairs of positions observed in the very large SH3 domain family were recapitulated in core sequences designed for SH3 domain structures. Taken together, these results suggest that the volume of sequence space optimal for a protein structure is surprisingly restricted to a region around the native sequence.
Article
De novo protein design has proven to be a powerful tool for understanding protein folding, structure, and function. In this Account, we highlight aspects of our research on the design of dimeric, four-helix bundles. Dimeric, four-helix bundles are found throughout nature, and the history of their design in our laboratory illustrates our hierarchic approach to protein design. This approach has been successfully applied to create a completely native-like protein. Structural and mutational analysis allowed us to explore the determinants of native protein structure. These determinants were then applied to the design of a dinuclear metal-binding protein that can now serve as a model for this important class of proteins.
Article
We used a novel charge optimization technique to study the small ribonuclease barnase and to analyze its interaction with a natural tight binding inhibitor, the protein barstar. The approach uses a continuum model to explicitly determine the charge distributions that lead to the most favorable electrostatic contribution to binding when competing desolvation and interaction effects are included. Given its backbone fold, barstar is electrostatically optimized for tight binding to barnase when compared with mutants where residues have been substituted with one of the 20 common amino acids. Natural proteins thus appear to use optimization of electrostatic interactions as one strategy for achieving tight binding.
Article
A fundamental test of our current understanding of protein folding is to rationally redesign protein folding pathways. We use a computer-based design strategy to switch the folding pathway of protein G, which normally involves formation of the second, but not the first, beta-turn at the rate limiting step in folding. Backbone conformations and amino acid sequences that maximize the interaction density in the first beta-hairpin were identified, and two variants containing 11 amino acid replacements were found to be approximately 4 kcal mol-1 more stable than wild type protein G. Kinetic studies show that the redesigned proteins fold approximately 100 x faster than wild type protein and that the first beta-turn is formed and the second disrupted at the rate limiting step in folding.
Article
We have developed a new method for the prediction of peptide sequences that bind to a protein, given a three-dimensional structure of the protein in complex with a peptide. By applying a recently developed sequence prediction algorithm and a novel ensemble averaging calculation, we generate a diverse collection of peptide sequences that are predicted to have significant affinity for the protein. Using output from the simulations, we create position-specific scoring matrices, or virtual interaction profiles (VIPs). Comparison of VIPs for a collection of binding motifs to sequences determined experimentally indicates that the prediction algorithm is accurate and applicable to a diverse range of structures. With these VIPs, one can scan protein sequence databases rapidly to seek binding partners of potential biological significance. Overall, this method can significantly enhance the information contained within a protein- peptide crystal structure, and enrich the data obtained by experimental selection methods such as phage display.
Article
Chemical genetic methods allow signal transduction pathways to be probed in a domain-specific manner. This subtle perturbation of function, when combined with classical genetic and biochemical data, allows for a better understanding of protein function. This in turn is leading to elucidation of pharmacological maps of signaling pathways. Recent studies have focused on diverse pathways, including the initiation of actin polymerization, oncogenic tyrosine kinase control of cell transformation, and molecular motor involvement in adaptation of sensory cells of the inner ear.
Article
The MM-PBSA (Molecular Mechanics-Poisson-Boltzmann surface area) method was applied to the human Growth Hormone (hGH) complexed with its receptor to assess both the validity and the limitations of the computational alanine scanning approach. A 400-ps dynamical trajectory of the fully solvated complex was simulated at 300 K in a 101 A x 81 A x 107 A water box using periodic boundary conditions. Long-range electrostatic interactions were treated with the particle mesh Ewald (PME) summation method. Equally spaced snapshots along the trajectory were chosen to compute the binding free energy using a continuum solvation model to calculate the electrostatic desolvation free energy and a solvent-accessible surface area approach to treat the nonpolar solvation free energy. Computational alanine scanning was performed on the same set of snapshots by mutating the residues in the structural epitope of the hormone and the receptor to alanine and recomputing the deltaGbinding. To further investigate a particular structure, a 200-ps dynamical trajectory of an R43A hormone-receptor complex was simulated. By postprocessing a single trajectory of the wild-type complex, the average unsigned error of our calculated deltadeltaGbinding is approximately1 kcal/mol for the alanine mutations of hydrophobic residues and polar/charged residues without buried salt bridges. When residues involved in buried salt bridges are mutated to alanine, it is demonstrated that a separate trajectory of the alanine mutant complex can lead to reasonable agreement with experimental results. Our approach can be extended to rapid screening of a variety of possible modifications to binding sites.
Article
We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full atomic description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering experiments. First, we considered a training database of 339 mutants in nine different proteins and optimised the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a standard deviation of 0.81 kcal mol(-1) and a slope of 0.76. The present energy function uses a minimum of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function. FOLDEF is available via a web-interface at http://fold-x.embl-heidelberg.de
Article
The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
Article
We report the computational redesign of the protein-binding interface of calmodulin (CaM), a small, ubiquitous Ca(2+)-binding protein that is known to bind to and regulate a variety of functionally and structurally diverse proteins. The CaM binding interface was optimized to improve binding specificity towards one of its natural targets, smooth muscle myosin light chain kinase (smMLCK). The optimization was performed using optimization of rotamers by iterative techniques (ORBIT), a protein design program that utilizes a physically based force-field and the Dead-End Elimination theorem to compute sequences that are optimal for a given protein scaffold. Starting from the structure of the CaM-smMLCK complex, the program considered 10(22) amino acid residue sequences to obtain the lowest-energy CaM sequence. The resulting eightfold mutant, CaM_8, was constructed and tested for binding to a set of seven CaM target peptides. CaM_8 displayed high binding affinity to the smMLCK peptide (1.3nM), similar to that of the wild-type protein (1.8nM). The affinity of CaM_8 to six other target peptides was reduced, as intended, by 1.5-fold to 86-fold. Hence, CaM_8 exhibited increased binding specificity, preferring the smMLCK peptide to the other targets. Studies of this type may increase our understanding of the origins of binding specificity in protein-ligand complexes and may provide valuable information that can be used in the design of novel protein receptors and/or ligands.
Article
We have generated an artificial highly specific endonuclease by fusing domains of homing endonucleases I-DmoI and I-CreI and creating a new 1400 A(2) protein interface between these domains. Protein engineering was accomplished by combining computational redesign and an in vivo protein-folding screen. The resulting enzyme, E-DreI (Engineered I-DmoI/I-CreI), binds a long chimeric DNA target site with nanomolar affinity, cleaving it precisely at a rate equivalent to its natural parents. The structure of an E-DreI/DNA complex demonstrates the accuracy of the protein interface redesign algorithm and reveals how catalytic function is maintained during the creation of the new endonuclease. These results indicate that it may be possible to generate novel highly specific DNA binding proteins from homing endonucleases.
Article
Specific protein-protein interactions are crucial in signaling networks and for the assembly of multi-protein complexes, and represent a challenging goal for protein design. Optimizing interaction specificity requires both positive design, the stabilization of a desired interaction, and negative design, the destabilization of undesired interactions. Currently, no automated protein-design algorithms use explicit negative design to guide a sequence search. We describe a multi-state framework for engineering specificity that selects sequences maximizing the transfer free energy of a protein from a target conformation to a set of undesired competitor conformations. To test the multi-state framework, we engineered coiled-coil interfaces that direct the formation of either homodimers or heterodimers. The algorithm identified three specificity motifs that have not been observed in naturally occurring coiled coils. In all cases, experimental results confirm the predicted specificities.
Article
During the past two years, significant advances have been made in the development of NMR methods for studying biomolecular dynamics on the microsecond to millisecond timescale. Applications of these methods to biologically relevant systems have provided compelling evidence that, in many cases, conformational dynamics on these timescales govern the rates of biomolecular recognition and catalysis.
Article
The large number of protein kinases makes it impractical to determine their specificities and substrates experimentally. Using the available crystal structures, molecular modeling, and sequence analyses of kinases and substrates, we developed a set of rules governing the binding of a heptapeptide substrate motif (surrounding the phosphorylation site) to the kinase and implemented these rules in a web-interfaced program for automated prediction of optimal substrate peptides, taking only the amino acid sequence of a protein kinase as input. We show the utility of the method by analyzing yeast cell cycle control and DNA damage checkpoint pathways. Our method is the only available predictive method generally applicable for identifying possible substrate proteins for protein serinethreonine kinases and helps in silico construction of signaling pathways. The accuracy of prediction is comparable to the accuracy of data from systematic large-scale experimental approaches.
Article
How scaffold proteins control information flow in signaling pathways is poorly understood: Do they simply tether components, or do they precisely orient and activate them? We found that the yeast mitogen-activated protein (MAP) kinase scaffold Ste5 is tolerant to major stereochemical perturbations; heterologous protein interactions could functionally replace native kinase recruitment interactions, indicating that simple tethering is largely sufficient for scaffold-mediated signaling. Moreover, by engineering a scaffold that tethers a unique kinase set, we could create a synthetic MAP kinase pathway with non-natural input-output properties. These findings demonstrate that scaffolds are highly flexible organizing factors that can facilitate pathway evolution and engineering.
Article
Hydrogen bonding is a key contributor to the specificity of intramolecular and intermolecular interactions in biological systems. Here, we develop an orientation-dependent hydrogen bonding potential based on the geometric characteristics of hydrogen bonds in high-resolution protein crystal structures, and evaluate it using four tests related to the prediction and design of protein structures and protein-protein complexes. The new potential is superior to the widely used Coulomb model of hydrogen bonding in prediction of the sequences of proteins and protein-protein interfaces from their structures, and improves discrimination of correctly docked protein-protein complexes from large sets of alternative structures.
Article
The binding between a PK and its target is highly specific, despite the fact that many different PKs exhibit significant sequence and structure homology. There must be, then, specificity-determining residues (SDRs) that enable different PKs to recognize their unique substrate. Here we use and further develop a computational procedure to discover putative SDRs (PSDRs) in protein families, whereby a family of homologous proteins is split into orthologous proteins, which are assumed to have the same specificity, and paralogous proteins, which have different specificities. We reason that PSDRs must be similar among orthologs, whereas they must necessarily be different among paralogs. Our statistical procedure and evolutionary model identifies such residues by discriminating a functional signal from a phylogenetic one. As case studies we investigate the prokaryotic two-component system and the eukaryotic AGC (i.e., cAMP-dependent PK, cGMP-dependent PK, and PKC) PKs. Without using experimental data, we predict PSDRs in prokaryotic and eukaryotic PKs, and suggest precise mutations that may convert the specificity of one PK to another. We compare our predictions with current experimental results and obtain considerable agreement with them. Our analysis unifies much of existing data on PK specificity. Finally, we find PSDRs that are outside the active site. Based on our results, as well as structural and biochemical characterizations of eukaryotic PKs, we propose the testable hypothesis of "specificity via differential activation" as a way for the cell to control kinase specificity.
Article
The formation of complexes between proteins and ligands is fundamental to biological processes at the molecular level. Manipulation of molecular recognition between ligands and proteins is therefore important for basic biological studies and has many biotechnological applications, including the construction of enzymes, biosensors, genetic circuits, signal transduction pathways and chiral separations. The systematic manipulation of binding sites remains a major challenge. Computational design offers enormous generality for engineering protein structure and function. Here we present a structure-based computational method that can drastically redesign protein ligand-binding specificities. This method was used to construct soluble receptors that bind trinitrotoluene, l-lactate or serotonin with high selectivity and affinity. These engineered receptors can function as biosensors for their new ligands; we also incorporated them into synthetic bacterial signal transduction pathways, regulating gene expression in response to extracellular trinitrotoluene or l-lactate. The use of various ligands and proteins shows that a high degree of control over biomolecular recognition has been established computationally. The biological and biosensing activities of the designed receptors illustrate potential applications of computational design.
Article
Hydrogen bond interactions were surveyed in a set of protein structures. Compared to surface positions, polar side-chains at core positions form a greater number of intra-molecular hydrogen bonds. Furthermore, the majority of polar side-chains at core positions form at least one hydrogen bond to main-chain atoms that are not involved in hydrogen bonds to other main-chain atoms. Based on this structural survey, hydrogen bond rules were generated for each polar amino acid for use in protein core design. In the context of protein core design, these prudent polar rules were used to eliminate from consideration polar amino acid rotamers that do not form a minimum number of hydrogen bonds. As an initial test, the core of Escherichia coli thioredoxin was selected as a design target. For this target, the prudent polar strategy resulted in a minor increase in computational complexity compared to a strategy that did not allow polar residues. Dead-end elimination was used to identify global minimum energy conformations for the prudent polar and no polar strategies. The prudent polar strategy identified a protein sequence that was thermodynamically stabilized by 2.5 kcal/mol relative to wild-type thioredoxin and 2.2 kcal/mol relative to a thioredoxin variant whose core was designed without polar residues.