Article

Amino acid pairing at the N- and C-termini of helical segments in proteins.

IBMC and LIACC, R. Campo Alegre, 1021/1055, 4169-007 Porto, Portugal.
Proteins Structure Function and Bioinformatics (impact factor: 3.39). 02/2008; 70(1):188-96. DOI:10.1002/prot.21525
Source: PubMed

ABSTRACT A systematic survey was carried out in an unbiased sample of 815 protein chains with a maximum of 20% homology selected from the Protein Data Bank, whose structures were solved at a resolution higher than 1.6 A and with a R-factor lower than 25%. A set of 5556 subsequences with alpha-helix or 3(10)-helix motifs was extracted from the protein chains considered. Global and local propensities were then calculated for all possible amino acid pairs of the type (i, i + 1), (i, i + 2), (i, i + 3), and (i, i + 4), starting at the relevant helical positions N1, N2, N3, C3, C2, C1, and N-int (interior positions), and also at the first nonhelical positions in both termini of the helices, namely, N-cap and C-cap. The statistical analysis of the propensity values has shown that pairing is significantly dependent on the type of the amino acids and on the position of the pair. A few sequences of three and four amino acids were selected and their high prevalence in helices is outlined in this work. The Glu-Lys-Tyr-Pro sequence shows a peculiar distribution in proteins, which may suggest a relevant structural role in alpha-helices when Pro is located at the C-cap position. A bioinformatics tool was developed, which updates automatically and periodically the results and makes them available in a web site.

0 0
 · 
1 Bookmark
 · 
21 Views
  • Source
    Article: Exploring ORFan Domains in Giant Viruses: Structure of Mimivirus Sulfhydryl Oxidase R596.
    [show abstract] [hide abstract]
    ABSTRACT: The mimivirus genome contains many genes that lack homologs in the sequence database and are thus known as ORFans. In addition, mimivirus genes that encode proteins belonging to known fold families are in some cases fused to domain-sized segments that cannot be classified. One such ORFan region is present in the mimivirus enzyme R596, a member of the Erv family of sulfhydryl oxidases. We determined the structure of a variant of full-length R596 and observed that the carboxy-terminal region of R596 assumes a folded, compact domain, demonstrating that these ORFan segments can be stable structural units. Moreover, the R596 ORFan domain fold is novel, hinting at the potential wealth of protein structural innovation yet to be discovered in large double-stranded DNA viruses. In the context of the R596 dimer, the ORFan domain contributes to formation of a broad cleft enriched with exposed aromatic groups and basic side chains, which may function in binding target proteins or localization of the enzyme within the virus factory or virions. Finally, we find evidence for an intermolecular dithiol/disulfide relay within the mimivirus R596 dimer, the first such extended, intersubunit redox-active site identified in a viral sulfhydryl oxidase.
    PLoS ONE 01/2012; 7(11):e50649. · 4.09 Impact Factor
  • Source
    Article: Folding by numbers: primary sequence statistics and their use in studying protein folding.
    [show abstract] [hide abstract]
    ABSTRACT: The exponential growth over the past several decades in the quantity of both primary sequence data available and the number of protein structures determined has provided a wealth of information describing the relationship between protein primary sequence and tertiary structure. This growing repository of data has served as a prime source for statistical analysis, where underlying relationships between patterns of amino acids and protein structure can be uncovered. Here, we survey the main statistical approaches that have been used for identifying patterns within protein sequences, and discuss sequence pattern research as it relates to both secondary and tertiary protein structure. Limitations to statistical analyses are discussed, and a context for their role within the field of protein folding is given. We conclude by describing a novel statistical study of residue patterning in beta-strands, which finds that hydrophobic (i,i+2) pairing in beta-strands occurs more often than expected at locations near strand termini. Interpretations involving beta-sheet nucleation and growth are discussed.
    International Journal of Molecular Sciences 05/2009; 10(4):1567-89. · 2.60 Impact Factor
  • Source
    Article: Position-specific propensities of amino acids in the β-strand.
    [show abstract] [hide abstract]
    ABSTRACT: Despite the importance of β-strands as main building blocks in proteins, the propensity of amino acid in β-strands is not well-understood as it has been more difficult to determine experimentally compared to α-helices. Recent studies have shown that most of the amino acids have significantly high or low propensity towards both ends of β-strands. However, a comprehensive analysis of the sequence dependent amino acid propensities at positions between the ends of the β-strand has not been investigated. The propensities of the amino acids calculated from a large non-redundant database of proteins are found to be highly position-specific and vary continuously throughout the length of the β-strand. They follow an unexpected characteristic periodic pattern in inner positions with respect to the cap residues in both termini of β-strands; this periodic nature is markedly different from that of the α-helices with respect to the strength and pattern in periodicity. This periodicity is not only different for different amino acids but it also varies considerably for the amino acids belonging to the same physico-chemical group. Average hydrophobicity is also found to be periodic with respect to the positions from both termini of β-strands. The results contradict the earlier perception of isotropic nature of amino acid propensities in the middle region of β-strands. These position-specific propensities should be of immense help in understanding the factors responsible for β-strand design and efficient prediction of β-strand structure in unknown proteins.
    BMC Structural Biology 09/2010; 10:29. · 2.48 Impact Factor

Keywords

815 protein chains
 
alpha-helices
 
bioinformatics tool
 
C-cap position
 
first nonhelical positions
 
Glu-Lys-Tyr-Pro sequence
 
interior positions
 
local propensities
 
peculiar distribution
 
periodically
 
possible amino acid pairs
 
protein chains
 
Protein Data Bank
 
R-factor lower
 
relevant helical positions N1
 
relevant structural role
 
resolution higher
 
statistical analysis
 
unbiased sample
 
web site