HSEpred: predict half-sphere exposure from protein sequences

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan.
Bioinformatics (Impact Factor: 4.62). 08/2008; 24(13):1489-97. DOI: 10.1093/bioinformatics/btn222
Source: PubMed

ABSTRACT Half-sphere exposure (HSE) is a newly developed two-dimensional solvent exposure measure. By conceptually separating an amino acid's sphere in a protein structure into two half spheres which represent its distinct spatial neighborhoods in the upward and downward directions, the HSE-up and HSE-down measures show superior performance compared with other measures such as accessible surface area, residue depth and contact number. However, currently there is no existing method for the prediction of HSE measures from sequence data.
In this article, we propose a novel approach to predict the HSE measures and infer residue contact numbers using the predicted HSE values, based on a well-prepared non-homologous protein structure dataset. In particular, we employ support vector regression (SVR) to quantify the relationship between HSE measures and protein sequences and evaluate its prediction performance. We extensively explore five sequence-encoding schemes to examine their effects on the prediction performance. Our method could achieve the correlation coefficients of 0.72 and 0.68 between the predicted and observed HSE-up and HSE-down measures, respectively. Moreover, contact number can be accurately predicted by the summation of the predicted HSE-up and HSE-down values, which has further enlarged the application of this method. The successful application of SVR approach in this study suggests that it should be more useful in quantifying the protein sequence-structure relationship and predicting the structural property profiles from protein sequences.
The prediction webserver and supplementary materials are accessible at
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Jiangning Song, Jul 05, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Solvent exposure of amino acids measures how deep residues are buried in tertiary structure of proteins, and hence it provides important information for analyzing and predicting protein structure and functions. Existing methods of calculating solvent exposure such as accessible surface area, relative accessible surface area, residue depth, contact number, and half-sphere exposure still have some limitations. In this article, we propose a novel solvent exposure measure named quadrant-sphere exposure (QSE) based on eight quadrants derived from spherical neighborhood. The proposed measure forms a microenvironment around Cα atom as a sphere with a radius of 13 Å, and subdivides it into eight quadrants according to a rectangular coordinate system constructed based on geometric relationships of backbone atoms. The number of neighboring Cα atoms whose labels are the same is given as the QSE value of the center Cα atom at hand. As evidenced by histograms that show very different distributions for different structure configurations, the proposed measure captures local properties that are characteristic for a residue's eight-directional neighborhood within a sphere. Compared with other measures, QSE provides a different view of solvent exposure, and provides information that is specific for different tertiary structure. As the experimental results show, QSE measure can potentially be used in protein structure analysis and predictions.
    Proteomics 10/2011; 11(19):3793-801. DOI:10.1002/pmic.201100189 · 3.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mitochondrial proteins of Plasmodium falciparum are considered as attractive targets for anti-malarial drugs, but the experimental identification of these proteins is a difficult and time-consuming task. Computational prediction of mitochondrial proteins offers an alternative approach. However, the commonly used subcellular location prediction methods are unsuited for P. falciparum mitochondrial proteins whereas the organism and organelle-specific methods were constructed on the basis of a rather small dataset. In this study, a novel dataset termed PfM233, which included 108 mitochondrial and 125 non-mitochondrial proteins with sequence similarity below 25%, was established and the methods for predicting mitochondrial proteins of P. falciparum were described. Both bi-profile Bayes and split amino acid composition were applied to extract the features from the N- and C-terminal sequences of these proteins, which were then used to construct two SVM based classifiers (PfMP-N25 and PfMP-30). Using PfM233 as the dataset, PfMP-N25 and PfMP-30 achieved accuracies (MCCs) of 90.13% (0.80) and 90.99% (0.82). When tested with the commonly used 40 mitochondrial proteins in PfM175 and the 108 mitochondrial proteins in PfM233, these two methods obviously outperformed the existing general, organelle-specific and organism and organelle-specific methods.
    Biochimie 04/2011; 93(4):778-82. DOI:10.1016/j.biochi.2011.01.013 · 3.12 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.
    Expert Review of Proteomics 11/2008; 5(5):653-62. DOI:10.1586/14789450.5.5.653 · 3.54 Impact Factor