Probing the "dark matter" of protein fold space.
ABSTRACT We used a protein structure prediction method to generate a variety of folds as alpha-carbon models with realistic secondary structures and good hydrophobic packing. The prediction method used only idealized constructs that are not based on known protein structures or fragments of them, producing an unbiased distribution. Model and native fold comparison used a topology-based method as superposition can only be relied on in similar structures. When all the models were compared to a nonredundant set of all known structures, only one-in-ten were found to have a match. This large excess of novel folds was associated with each protein probe and if true in general, implies that the space of possible folds is larger than the space of realized folds, in much the same way that sequence-space is larger than fold-space. The large excess of novel folds exhibited no unusual properties and has been likened to cosmological dark matter.
- SourceAvailable from: PubMed Central[Show abstract] [Hide abstract]
ABSTRACT: In 2008, I reviewed and proposed a model for our discovery in 2005 that unrefoldable and insoluble proteins could in fact be solubilized in unsalted water. Since then, this discovery has offered us and other groups a powerful tool to characterize insoluble proteins, and we have further addressed several fundamental and disease-relevant issues associated with this discovery. Here I review these results, which are conceptualized into several novel scenarios. 1) Unlike 'misfolded proteins', which still retain the capacity to fold into well-defined structures but are misled to 'off-pathway' aggregation, unrefoldable and insoluble proteins completely lack this ability and will unavoidably aggregate in vivo with ~150 mM ions, thus designated as 'intrinsically insoluble proteins (IIPs)' here. IIPs may largely account for the 'wastefully synthesized' DRiPs identified in human cells. 2) The fact that IIPs including membrane proteins are all soluble in unsalted water, but get aggregated upon being exposed to ions, logically suggests that ions existing in the background play a central role in mediating protein aggregation, thus acting as 'dark mediators'. Our study with 14 salts confirms that IIPs lack the capacity to fold into any well-defined structures. We uncover that salts modulate protein dynamics and anions bind proteins with high selectivity and affinity, which is surprisingly masked by pre-existing ions. Accordingly, I modified my previous model. 3) Insoluble proteins interact with lipids to different degrees. Remarkably, an ALS-causing P56S mutation transforms the β-sandwich MSP domain into a helical integral membrane protein. Consequently, the number of membrane-interacting proteins might be much larger than currently recognized. To attack biological membranes may represent a common mechanism by which aggregated proteins initiate human diseases. 4) Our discovery also implies a solution to the 'chicken-and-egg paradox' for the origin of primitive membranes embedded with integral membrane proteins, if proteins originally emerged in unsalted prebiotic media.F1000Research. 01/2013; 2:94.
- [Show abstract] [Hide abstract]
ABSTRACT: It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/β packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.PLoS ONE 09/2014; 9(9):e107959. · 3.53 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.Journal of The Royal Society Interface 08/2014; 11(100):20140419. · 3.86 Impact Factor