Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences.

Centro de Biología Molecular Severo Ochoa, (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain.
Proteins Structure Function and Bioinformatics (Impact Factor: 3.34). 07/2008; 73(4):872-88. DOI: 10.1002/prot.22113
Source: PubMed

ABSTRACT The complexity of protein structures calls for simplified representations of their topology. The simplest possible mathematical description of a protein structure is a one-dimensional profile representing, for instance, buriedness or secondary structure. This kind of representation has been introduced for studying the sequence to structure relationship, with applications to fold recognition. Here we define the effective connectivity profile (EC), a network theoretical profile that self-consistently represents the network structure of the protein contact matrix. The EC profile makes mathematically explicit the relationship between protein structure and protein sequence, because it allows predicting the average hydrophobicity profile (HP) and the distributions of amino acids at each site for families of homologous proteins sharing the same structure. In this sense, the EC provides an analytic solution to the statistical inverse folding problem, which consists in finding the statistical properties of the set of sequences compatible with a given structure. We tested these predictions with simulations of the structurally constrained neutral (SCN) model of protein evolution with structure conservation, for single- and multi-domain proteins, and for a wide range of mutation processes, the latter producing sequences with very different hydrophobicity profiles, finding that the EC-based predictions are accurate even when only one sequence of the family is known. The EC profile is very significantly correlated with the HP for sequence-structure pairs in the PDB as well. The EC profile generalizes the properties of previously introduced structural profiles to modular proteins such as multidomain chains, and its correlation with the sequence profile is substantially improved with respect to the previously defined profiles, particularly for long proteins. Furthermore, the EC profile has a dynamic interpretation, since the EC components are strongly inversely related with the temperature factors measured in X-ray experiments, meaning that positions with large EC component are more strongly constrained in their equilibrium dynamics. Last, the EC profile allows to define a natural measure of modularity that correlates with the number of domains composing the protein, suggesting its application for domain decomposition. Finally, we show that structurally similar proteins have similar EC profiles, so that the similarity between aligned EC profiles can be used as a structure similarity measure, a property that we have recently applied for protein structure alignment. The code for computing the EC profile is available upon request writing to, and the structural profiles discussed in this article can be downloaded from the SLOTH webserver

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change.
    Biomolecules. 03/2014; 4(1):291-314.
  • [Show abstract] [Hide abstract]
    ABSTRACT: While sequence-based methods are widely used as reliable tools for protein function prediction in general, these methods are likely to fail in cases of low sequence similarity. This is due to the fact that proteins with low sequence similarity may nevertheless have similar functions and exhibit similar structures. In such cases, structure-based comparison methods can help to provide further insights owing to the widely accepted paradigm that structure mirrors function. Moreover, thanks to the steady increase in structural information with the advent of structural genomic projects and the steady improvements in structure prediction, these methods are becoming more and more applicable. Many structure-based approaches to the comparative analysis of proteins and the inference of protein function rely on graph formalisms for modeling protein structures and, correspondingly, employ graph-theoretic algorithms for analyzing and comparing such structures. This review is devoted to approaches of that kind and presents an overview of the most important graph-based algorithms.
    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 09/2013; 3(5). · 1.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper builds upon the fundamental paper by \citet{niwa2009} that provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations \cite{doi:10.1021/ja1116233}. Here in this paper we apply different state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins of the Niwa et al. data base, going from pure sequence to chemico-physical labeled (contact) graphs. By this comparison, we are able to identify some interesting general properties of protein universe, going from the confirming of a threshold size around 250 residues (discriminating "easily foldable" from "difficultly foldable" molecules consistent with other independent data on protein domains architecture) to the relevance of contact graphs eigenvalue ordering for folding behavior discrimination and characterization of the E. coli data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.