Probing the “Dark Matter” of Protein Fold Space

Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, London, UK.
Structure (Impact Factor: 5.62). 09/2009; 17(9):1244-52. DOI: 10.1016/j.str.2009.07.012
Source: PubMed


We used a protein structure prediction method to generate a variety of folds as alpha-carbon models with realistic secondary structures and good hydrophobic packing. The prediction method used only idealized constructs that are not based on known protein structures or fragments of them, producing an unbiased distribution. Model and native fold comparison used a topology-based method as superposition can only be relied on in similar structures. When all the models were compared to a nonredundant set of all known structures, only one-in-ten were found to have a match. This large excess of novel folds was associated with each protein probe and if true in general, implies that the space of possible folds is larger than the space of realized folds, in much the same way that sequence-space is larger than fold-space. The large excess of novel folds exhibited no unusual properties and has been likened to cosmological dark matter.

Download full-text


Available from: William R Taylor, Oct 13, 2015
  • Source
    • "template libraries or synthetic structures. The former can be used to develop more sensitive threading approaches; the latter are widely used in studies on the completeness of protein structure space [29] as well as in research focusing on the origin of folds and protein universe [30,31]. The effective procedure for the design of a quasi-stable sequence for an arbitrary structure also provides a desired linkage between protein structure and function in computer experiments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many structural bioinformatics approaches employ sequence profile-based threading techniques. To improve fold recognition rates, homology searching may include artificially evolved amino acid sequences, which were demonstrated to enhance the sensitivity of protein threading in targeting midnight zone templates. We describe implementation details of eVolver, an optimization algorithm that evolves protein sequences to stabilize the respective structures by a variety of potentials, which are compatible with those commonly used in protein threading. In a case study focusing on LARG PDZ domain, we show that artificially evolved sequences have quite high capabilities to recognize the correct protein structures using standard sequence profile-based fold recognition. Computationally design protein sequences can be incorporated in existing sequence profile-based threading approaches to increase their sensitivity. They also provide a desired linkage between protein structure and function in in silico experiments that relate to e.g. the completeness of protein structure space, the origin of folds and protein universe. eVolver is freely available as a user-friendly webserver and a well-documented stand-alone software distribution at
    BMC Research Notes 07/2013; 6(1):303. DOI:10.1186/1756-0500-6-303
  • Source
    • "Large proteins demonstrate (see [56] and Fig. 3B) a comparatively low ''relative contact order'', CO [57] (see Section 7), which means that their folds do not contain many long closed loops that would slow down their folding, while the large protein structures with high relative contact order are excluded: they are not suitable for sufficiently fast folding. It is noteworthy that, in contrast to what is observed for small proteins, the space of all possible (from the viewpoint of stability only) folds of proteins of >100 residues has been reported to be much larger than the space of folds found in nature [58] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Experimentally measured rates of spontaneous folding of single-domain globular proteins range from microseconds to hours: the difference (11 orders of magnitude!) is akin to the difference between the life span of a mosquito and the age of the Universe. We show that physical theory with biological constraints outlines the possible range of folding rates for single-domain globular proteins of various size and stability, and that the experimentally measured folding rates fall within this narrow "golden triangle" built without any adjustable parameters, filling it almost completely. This "golden triangle" also successfully predicts the maximal allowed size of the "foldable" protein domains, as well as the maximal size of protein domains that fold under solely thermodynamic (rather than kinetic) control. In conclusion, we give a phenomenological formula for dependence of the folding rate on the size, shape and stability of the protein fold.
    FEBS letters 05/2013; 587(13). DOI:10.1016/j.febslet.2013.04.041 · 3.17 Impact Factor
  • Source
    • "Of these, over 2000 have unique folds with the reduction resulting through the same fold being derived from different starting probe proteins. Interestingly, most of these do not occur in the PDB (Taylor et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
    Computational biology and chemistry 06/2011; 35(3):174-88. DOI:10.1016/j.compbiolchem.2011.04.008 · 1.12 Impact Factor
Show more