Effective connectivity profile: a structural representation that evidences the relationship between protein structures and sequences.
ABSTRACT The complexity of protein structures calls for simplified representations of their topology. The simplest possible mathematical description of a protein structure is a one-dimensional profile representing, for instance, buriedness or secondary structure. This kind of representation has been introduced for studying the sequence to structure relationship, with applications to fold recognition. Here we define the effective connectivity profile (EC), a network theoretical profile that self-consistently represents the network structure of the protein contact matrix. The EC profile makes mathematically explicit the relationship between protein structure and protein sequence, because it allows predicting the average hydrophobicity profile (HP) and the distributions of amino acids at each site for families of homologous proteins sharing the same structure. In this sense, the EC provides an analytic solution to the statistical inverse folding problem, which consists in finding the statistical properties of the set of sequences compatible with a given structure. We tested these predictions with simulations of the structurally constrained neutral (SCN) model of protein evolution with structure conservation, for single- and multi-domain proteins, and for a wide range of mutation processes, the latter producing sequences with very different hydrophobicity profiles, finding that the EC-based predictions are accurate even when only one sequence of the family is known. The EC profile is very significantly correlated with the HP for sequence-structure pairs in the PDB as well. The EC profile generalizes the properties of previously introduced structural profiles to modular proteins such as multidomain chains, and its correlation with the sequence profile is substantially improved with respect to the previously defined profiles, particularly for long proteins. Furthermore, the EC profile has a dynamic interpretation, since the EC components are strongly inversely related with the temperature factors measured in X-ray experiments, meaning that positions with large EC component are more strongly constrained in their equilibrium dynamics. Last, the EC profile allows to define a natural measure of modularity that correlates with the number of domains composing the protein, suggesting its application for domain decomposition. Finally, we show that structurally similar proteins have similar EC profiles, so that the similarity between aligned EC profiles can be used as a structure similarity measure, a property that we have recently applied for protein structure alignment. The code for computing the EC profile is available upon request writing to firstname.lastname@example.org, and the structural profiles discussed in this article can be downloaded from the SLOTH webserver http://www.fkp.tu-darmstadt.de/SLOTH/.
- SourceAvailable from: 22.214.171.124[show abstract] [hide abstract]
ABSTRACT: A computer program that progressively evaluates the hydrophilicity and hydrophobicity of a protein along its amino acid sequence has been devised. For this purpose, a hydropathy scale has been composed wherein the hydrophilic and hydrophobic properties of each of the 20 amino acid side-chains is taken into consideration. The scale is based on an amalgam of experimental observations derived from the literature. The program uses a moving-segment approach that continuously determines the average hydropathy within a segment of predetermined length as it advances through the sequence. The consecutive scores are plotted from the amino to the carboxy terminus. At the same time, a midpoint line is printed that corresponds to the grand average of the hydropathy of the amino acid compositions found in most of the sequenced proteins. In the case of soluble, globular proteins there is a remarkable correspondence between the interior portions of their sequence and the regions appearing on the hydrophobic side of the midpoint line, as well as the exterior portions and the regions on the hydrophilic side. The correlation was demonstrated by comparisons between the plotted values and known structures determined by crystallography. In the case of membrane-bound proteins, the portions of their sequences that are located within the lipid bilayer are also clearly delineated by large uninterrupted areas on the hydrophobic side of the midpoint line. As such, the membrane-spanning segments of these proteins can be identified by this procedure. Although the method is not unique and embodies principles that have long been appreciated, its simplicity and its graphic nature make it a very useful tool for the evaluation of protein structures.Journal of Molecular Biology 06/1982; 157(1):105-32. · 3.91 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: An elastic network model is proposed for the interactions between closely (< or = 7.0 A) located alpha-carbon pairs in folded proteins. A single-parameter harmonic potential is adopted for the fluctuations of residues about their mean positions in the crystal structure. The model is based on writing the Kirchhoff adjacency matrix for a protein defining the proximity of residues in space. The elements of the inverse of the Kirchhoff matrix give directly the auto-correlations or cross-correlations of atomic fluctuations. The temperature factors of the C alpha atoms of 12 X-ray structures, ranging from a 41 residue subunit to a 633 residue dimer, are accurately predicted. Cross-correlations are also efficiently characterized, in close agreement with results obtained with a normal mode analysis coupled with energy minimization. The simple model and method proposed here provide a satisfactory description of the correlations between atomic fluctuations. Furthermore, this is achieved within computation times at least one order of magnitude shorter than commonly used molecular approaches.Folding and Design 02/1997; 2(3):173-81.
- [show abstract] [hide abstract]
ABSTRACT: A protein sequence folds into a unique three-dimensional protein structure. Different sequences, though, can fold into similar structures. How stable is a protein structure with respect to sequence changes? What percentage of the sequence is 'anchor' residues, that is, residues crucial for protein structure and function? Here, answers to these questions are pursued by analyzing large numbers of structurally homologous protein pairs. Most pairs of similar structures have sequence identity as low as expected from randomly related sequences (8-9%). On average, only 3-4% of all residues are 'anchor' residues. The symmetric shape of the distribution at low sequence identity suggests that for most structures, four billion years of evolution was sufficient to reach an equilibrium. The mean identities for convergent (different ancestor) and divergent (same ancestor) evolution of proteins to similar structures are quite close and hence, in most cases, it is difficult to distinguish between the two effects. In particular, low levels of sequence identity appear not to be indicative of convergent evolution.Folding and Design 02/1997; 2(3):S19-24.