ArticlePDF Available

Transferable Coarse-Grained Potential for De Novo Protein Folding and Design

PLOS
PLOS ONE
Authors:

Abstract and Figures

Protein folding and design are major biophysical problems, the solution of which would lead to important applications especially in medicine. Here a novel protein model capable of simultaneously provide quantitative protein design and folding is introduced. With computer simulations it is shown that, for a large set of real protein structures, the model produces designed sequences with similar physical properties to the corresponding natural occurring sequences. The designed sequences are not yet fully realistic and require further experimental testing. For an independent set of proteins, notoriously difficult to fold, the correct folding of both the designed and the natural sequences is also demonstrated. The folding properties are characterized by free energy calculations. which not only are consistent among natural and designed proteins, but we also show a remarkable precision when the folded structures are compared to the experimentally determined ones. Ultimately, this novel coarse-grained protein model is unique in the combination of its fundamental three features: its simplicity, its ability to produce natural foldable designed sequences, and its structure prediction precision. The latter demonstrated by free energy calculations. It is also remarkable that low frustration sequences can be obtained with such a simple and universal design procedure, and that the folding of natural proteins shows funnelled free energy landscapes without the need of any potentials based on the native structure.
Content may be subject to copyright.
A preview of the PDF is not available
... Such a huge letter in order provide for proteins the huge assortment of setups and capacities that we know up until now. The approach of fake protein development (otherwise called protein plan) (Dahiyat and Mayo, 1997;Koehl and Levitt, 1999;Kortemme and Baker, 2004;Fung et al., 2008;Coluzza, 2011;Koga et al., 2012;Coluzza, 2014;Sevy et al., 2015;Bianco et al., 2017;Chevalier et al., 2017;Marcos et al., 2017;Nerattini et al., 2018) opens the likelihood to address major inquiries regarding the idea of the amino corrosive letter sets (Davidson and Sauer, 1994;Davidson et al., 1995;Cordes et al., 1996;Riddle et al., 1997). One of the inquiries that for the most part pulled in the consideration of mainstream researchers was in relation through the all-inclusiveness of 20 letters. ...
... The considered frameworks are made out of the modelled protein (MP) and a probable restricting accomplice (a form of a piece belonging to protein, which impersonates with a shallow form, a potential restricting site of a protein that is bigger in shape). Both the protein as well as restricting accessories having the Caterpillar coarse-grain show, which aims to effectively plan and refold regular and fake proteins (Coluzza, 2011;2014;Wang et al., 2019). At the same time successions were planned for the protein-accomplice framework as per different advancement pathways in order to be specific for ideal collapsing of the protein and ideal communication for the coupling accomplice. ...
... From the addressed situation, we can make two vital purposes: initially, molecule with a constrained letter set of 4 letters can create a pipe like collapsing free vitality scene; also, with 6 letters we recouped the collapsing accuracy of past protein model plans made of 20 letters (Coluzza, 2014) Our outcomes are steady with the trial observation that 6 letters are an insignificant set important for keeping up the protein structure and capacity (Plaxco et al., 1998;Chan, 1999;Wang and Wang, 1999;Murphy et al., 2000;Reetz and Wu, 2008;Solis, 2015). ...
Article
Understanding the basic source of twenty letter set of proteins is a substantial biophysical and non-deterministic polynomial hard problem. It specifically concentrates on the broader effect of a decreased letter in order to estimate on the collapsing properties. In any case, common letters in order is a bargain among adaptability and enhancement of the accessible assets. In the current work, an extra effect of general accessibility is incorporated in order to show a protein configuration strategy which includes the opposition for assets between a protein and its prospective association accomplice. Moreover, it also allows for exploring the effect caused by the decreased letter set on protein-protein communications. The ideal decreased organization of letters for the plan of protein is distinguished and it was observed that the ordered letters were reduced to 4 letters taking single protein collapsing into consideration. In any case, it is only 6 letters that accomplished ideal collapsing; in this manner recouping investigations are repeated. It was also examined that the observation between the protein and a potential connection accomplice could not be maintained at a strategic distance from which this study reduced the letter sets. In this way, we recommend that accumulation could have been the main incentive for the advancement of substantial protein letters in order. Communicated by Ramaswamy H. Sarma
... We employ the Caterpillar coarse-grained protein model that reduces the complexity of the amino acids to the backbone atoms: C, O, C a , N, H. Although the full Hamiltonian of the Caterpillar model, as described in Ref. [61,62] (see SI for details), explicitly depends on the specific orientation of the backbone atoms, in the design procedure we only consider the energy terms that are directly affected by the amino acid identities, since the protein conformation is not varied during the simulation. ...
... As fully described in Refs. [61,62] the Caterpillar model, in combination with Virtual Move Parallel Tempering (VMPT), that includes adaptive umbrella sampling technique, [64,65] is capable of producing a large number of sequences that should fold into realistic protein target structures (see SI for details). We extensively verified the latter hypothesis in our previous studies (e. g. Figure 4 of [61]) but also from the Random Energy model [39,66,67] predicting that two sequences with the same energy on the target structure are equivalent solutions to the folding problem. ...
... The caterpillar model fulfils such a requirement. [61,62] This property means that also that any model or force field capable of generating sequences and refold them into the natural backbone structure would be usable for the purpose of our analysis. The first evidence to support our claim comes from our previous work on heteropolymer design including the Caterpillar design Our work on design showed that provided that a heteropolymer chain is designable (we defined the rules to identify such property) then the 3D structures can be designed with high accuracy independently of the interaction matrix used to define the amino acid interactions. ...
Article
Full-text available
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co‐evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co‐evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co‐evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.
... Our calculations predict that to design heteropolymers with two directional interaction sites per monomer (e.g., proteins), the minimum alphabet necessary is composed of just four letters. This prediction is consistent with the experimental observation that just five letters are enough to encode structural information in proteins [36][37][38][39][40][41] and with a study performed with the Caterpillar protein model [42,43] that will be presented in an upcoming publication. ...
... We ranked common polymers monomers according to our prediction on their designability, and we identified polyurea, polyamide, and polyurethane as optimal choices for the synthesis of designable heteropolymers. It is important to mention that the method can be applied to compare the designability of different protein models and estimate the minimum alphabet necessary to design them, that seems to be already of four letters for the Caterpillar coarsegrained model, [43] as suggested by an upcoming work of some of the present authors. As a future perspective, we are planning to measure the dependence of omega from the bending rigidity and for combinations of particles with different geometries. ...
Article
Full-text available
Understanding how to design the structure of heteropolymers through their monomer sequence will have a significant impact on the creation of novel artificial materials. According to mean‐field theories, the minimum number—or alphabet—of distinct monomers necessary to achieve such designability is directly related to the conformational entropy ω of compact polymer structures. Here, a computational strategy to calculate this conformational entropy is introduced and thus predict the minimum alphabet to achieve designability, for a generalized heteropolymer model. The comparison of the predictions with previous results proves the robustness of the approach. It is quantified for the first time how the number of directional interactions is critical for achieving the designability. The methodology that is introduced can be easily generalized to models representing specific polymers. A comparison between conventional polymers monomers are provided, and it is predicted that polyurea, polyamide, and polyurethane residues are optimal candidates to be functionalized for the experimental synthesis of designable heteropolymers. As such, our method can guide the engineering of new types of self‐assembling modular polymers, that will open new possibilities for polymer‐based materials with unmatched versatility and control.
... We observe that the effective alphabet grows from 4 to 6 letters going from larger (ζ = 0.20 and 0.40) to smaller Γ proteins (ζ = 0.60 and 0.80) respectively. It is interesting to notice that the alphabets are made of amino acids with an average attractive pair-interaction energy and high variability in terms of the residue-solvent interactions (see Table S1 in ref. 9 ). Moreover, the alphabets differ from each other (letters GKVY and GKRV corresponding to ζ = . ...
Article
Full-text available
Isolating the properties of proteins that allow them to convert sequence into the structure is a long-lasting biophysical problem. In particular, studies focused extensively on the effect of a reduced alphabet size on the folding properties. However, the natural alphabet is a compromise between versatility and optimisation of the available resources. Here, for the first time, we include the impact of the relative availability of the amino acids to extract from the 20 letters the core necessary for protein stability. We present a computational protein design scheme that involves the competition for resources between a protein and a potential interaction partner that, additionally, gives us the chance to investigate the effect of the reduced alphabet on protein-protein interactions. We devise a scheme that automatically identifies the optimal reduced set of letters for the design of the protein, and we observe that even alphabets reduced down to 4 letters allow for single protein folding. However, it is only with 6 letters that we achieve optimal folding, thus recovering experimental observations. Additionally, we notice that the binding between the protein and a potential interaction partner could not be avoided with the investigated reduced alphabets. Therefore, we suggest that aggregation could have been a driving force in the evolution of the large protein alphabet.
Article
Coarse-graining is commonly used to decrease the computational cost of simulations. However, coarse-grained models are also considered to have lower transferability, with lower accuracy for systems outside the original scope of parametrization. Here, we benchmark a bead-necklace model and a modified Martini 2 model, both coarse-grained models, for a set of intrinsically disordered proteins, with the different models having different degrees of coarse-graining. The SOP-IDP model has earlier been used for this set of proteins; thus, those results are included in this study to compare how models with different levels of coarse-graining compare. The sometimes naive expectation of the least coarse-grained model performing best does not hold true for the experimental pool of proteins used here. Instead, it showed the least good agreement, indicating that one should not necessarily trust the otherwise intuitive notion of a more advanced model inherently being better in model choice.
Article
Nanoparticles (NPs) and other engineered nanomaterials have great potential as nanodrugs or nanomedical devices for biomedical applications. However, the adsorption of proteins in blood circulation or similar physiological fluids can significantly alter the surface properties and therapeutic response induced by most nanomaterials. For example, interaction with proteins can change the bloodstream circulation time and availability of therapeutic NPs or hinder the accumulation in their desired target organs. Proteins can also trigger or prevent agglomeration. By combining experimental and computational approaches, we have developed NPs carrying polyethylene glycol (PEG) polymeric coatings that mimic the surface charge distribution of proteins typically found in blood, which are known to show low aggregation under normal blood conditions. Here, we show that NPs with coatings based on apoferritin or human serum albumin display better antifouling properties and weaker protein interaction compared to similar NPs carrying conventional PEG polymeric coatings.
Article
Full-text available
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins’ most remarkable feature is their modularity. The large amount of information required to specify each protein’s function is analogically encoded with an alphabet of just ~20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Article
Full-text available
Protein design is the inverse approach of the three-dimensional (3D) structure prediction for elucidating the relationship between the 3D structures and amino acid sequences. In general, the computation of the protein design involves a double loop: A loop for amino acid sequence changes and a loop for an exhaustive conformational search for each amino acid sequence. Herein, we propose a novel statistical mechanical design method using Bayesian learning, which can design lattice proteins without the exhaustive conformational search. We consider a thermodynamic hypothesis of the evolution of proteins and apply it to the prior distribution of amino acid sequences. Furthermore, we take the water effect into account in view of the grand canonical picture. As a result, on applying the 2D lattice hydrophobic-polar (HP) model, our design method successfully finds an amino acid sequence for which the target conformation has a unique ground state. However, the performance was not as good for the 3D lattice HP models compared to the 2D models. The performance of the 3D model improves on using a 20-letter lattice proteins. Furthermore, we find a strong linearity between the chemical potential of water and the number of surface residues, thereby revealing the relationship between protein structure and the effect of water molecules. The advantage of our method is that it greatly reduces computation time, because it does not require long calculations for the partition function corresponding to an exhaustive conformational search. As our method uses a general form of Bayesian learning and statistical mechanics and is not limited to lattice proteins, the results presented here elucidate some heuristics used successfully in previous protein design methods.
Article
Macromolecular materials with directional interactions such as hydrogen bonds exhibit numerous attractive features in terms of structure, thermodynamics, and dynamics. Besides enabling precise tuning of desirable geometries in the assembled state (e.g., programmable coordination numbers depending on the valency of the directional interaction), mixing in a blend/composite through stabilization via hydrogen bonds between the various components, hydrogen bonds can also impart responsiveness to external stimuli (e.g., temperature, pH). In biomacromolecules (e.g., proteins, DNA, polysaccharides), hydrogen bonds play a key role in stabilizing secondary and tertiary structures, which in turn define the function of these macromolecules. In this Viewpoint, I present the challenges, successes, and opportunities for molecular modeling and simulations to conduct fundamental and application-focused research on macromolecular materials with hydrogen bonding interactions. The past successes and limitations of atomistic simulations are discussed first, followed by highlights from recent developments in coarse-grained modeling and their use in studies of (synthetic and biologically relevant) macromolecular materials. Model development focused on polynucleotides (e.g., DNA, RNA, etc.), polypeptides, polysaccharides, and synthetic polymers at experimentally relevant conditions are highlighted. This viewpoint ends with potential future directions for macromolecular modeling and simulations with other types of directional interactions beyond hydrogen bonding.
Article
Intrinsically disordered proteins do not adopt well-defined structures, yet they still play functional roles in many different aspects of biology. Their lack of stable conformations poses new challenges to the quantitative description and understanding of their processes, since they cannot be formulated within the classical terms of structural biology. Polymer physics is emerging as a powerful language to identify, describe, and quantify the molecular determinants of the disordered conformational ensemble. Here, I will review the application of key-concepts of polymer theories to intrinsically disordered proteins, with a particular focus on the role played by residue-residue and residue-solvent interactions in modulating conformational transitions in the disordered structural ensemble.
Article
Full-text available
In 2008, to the consternation of some, one of the editors of this special issue on the “Chemical Physics of Protein Folding,” was quoted as saying, “What was called the protein folding problem 20 years ago is solved” (1). One purpose of this special issue is to drive home this point. The other, more important purpose is to illustrate how workers on the protein folding problem, by moving beyond their early obsession with seeming paradoxes (2), are developing a quantitative understanding of how the simpler biological structures assemble both in vitro and in vivo. The emerging quantitative understanding reveals simultaneously the richness of folding phenomena and the elegant simplicity of the underlying principles of spontaneous biomolecular assembly. The appreciation of these contrasting aspects of the folding problem has come about through the cooperation of theorists and experimentalists, a theme common to all the contributions to this special issue. Although the basic ideas about the folding energy landscape have turned out to be quite simple, entering even into some undergraduate textbooks (3), exploring their consequences in real systems has required painstaking intellectual analysis, as well as detailed computer simulations and experiments that still stretch the bounds of what is feasible. The backgrounds of the contributors to this issue reflect the breadth of the folding field and range from computer science and theoretical physics to molecular biology and organic chemistry. A great deal of the progress in the field can thus be traced to a fairly successful effort to develop a common language and conceptual framework for describing folding.
Article
Full-text available
Knotted chains are a promising class of polymers with many applications for materials science and drug delivery. Here we introduce an experimentally realizable model for the design of chains with controllable topological properties. Recently, we have developed a systematic methodology to construct self-assembling chains of simple particles, with final structures fully controlled by the sequence of particles along the chain. The individual particles forming the chain are colloids decorated with mutually interacting patches, which can be manufactured in the laboratory with current technology. Our methodology is applied to the design of sequences folding into self-knotting chains, in which the end monomers are by construction always close together in space. The knotted structure can then be externally locked simply by controlling the interaction between the end monomers, paving the way to applications in the design and synthesis of active materials and novel carriers for drugs delivery.
Chapter
At one point in the history of protein chemistry it was thought that—because of the vast number of possible sequences—proteins ought to exist in a countless number of forms befitting any conceivable function or structure. In fact, as the number of known amino acid sequences continues to mount, it is becoming abudantly clear that there is a practical limit to the number of types of protein structures that exist in living systems on Earth. Thus, although the number of possible sequences for 20 amino acids arranged randomly in strings of 350 units is a superastronomical 20350, certainly nowhere near that number of protein sequences has ever or will ever exist. Instead, a small number of genetically encoded protein structures has been expanded by the general route of “duplication and modification.” The duplications come in various degrees, from the very short to the supragenic or chromosomal. Postduplication modification mostly takes the form of base substitutions leading to amino acid replacement, and, theoretically, the history of any protein ought to be evident by appropriate comparison of the diverging sequences.
Article
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Article
An intermediate-resolution model of small, homogeneous peptides is introduced, and discontinuous molecular dynamics simulation is applied to study secondary structure formation. Physically, each model residue consists of a detailed three-bead backbone and a simplified single-bead side-chain. Excluded volume and hydrogen bond interactions are constructed with discontinuous (i.e., hard-sphere and square-well) potentials. Simulation results show that the backbone motion of the model is limited to realistic regions of Φ–Ψ conformational space. Model polyalanine chains undergo a locally cooperative transition to form α-helices that are stabilized by backbone hydrogen bonding, while model polyglycine chains tend to adopt nonhelical structures. When side-chain size is increased beyond a critical diameter, steric interactions prevent formation of long α-helices. These trends in helicity as a function of residue type have been well documented by experimental, theoretical, and simulation studies and demonstrate the ability of the intermediate-resolution model developed in this work to accurately mimic realistic peptide behavior. The efficient algorithm used permits observation of the complete helix–coil transition within 15 min on a single-processor workstation, suggesting that simulations of very long times are possible with this model. Proteins 2001;44:344–360. © 2001 Wiley-Liss, Inc.
Book
Topics covered include: EEG analysis; computer methods in electrocardiography; thermal properties of biomaterials, the physiological model, computer-aided instruction in medicine, chemotaxis in bacteria, muscle structure and contraction, electron microspectroscopy, spatial organization in animal development, determination of structure by neutron scattering application of intensity fluctuation spectroscopy to molecular biology, physical state of diffusible ions in cells, fluorescent probes in nerve membranes, concentration correlation spectroscopy, antibiotics and membrane biology, biomedical materials, calcium transport, artificial kidneys, survival distribution, computer monitoring in patient care, structure of tRNA, and computers in the clinical pathologic laboratory. Separate abstracts were prepared for two papers. (MCG)
Article
We develop a coarse-grained model where solvent is considered implicitly, electrostatics are included as short-range interactions and side-chains are coarse-grained to a single bead. The model depends on three main parameters: hydrophobic, electrostatic and side-chain hydrogen bond strength. The parameters are determined by considering three level of approximations and characterizing the folding for three selected proteins (training set). Nine additional proteins (containing up to 126 residues) as well as mutated versions (test set) are folded with the given parameters. In all folding simulations, the initial state is a random coil configuration. Besides the native state, some proteins fold into an additional state differing in the topology (structure of the helical bundle). We discuss the stability of the native states, and compare the dynamics of our model to all atom molecular dynamics simulations as well as some general properties on the interactions governing folding dynamics. Proteins 2013. © 2013 Wiley Periodicals, Inc.
Article
Bell System Technical Journal, also pp. 623-656 (October)