Real value prediction of solvent accessibility from amino acid sequence. Proteins

RIKEN Tsukuba Institute, Ibaraki, Japan.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.63). 03/2003; 50(4):629-35. DOI: 10.1002/prot.10328
Source: PubMed


The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.

13 Reads
    • "Many sequence-based methods were developed for predicting ASA(Adamczak, et al., 2004; Ahmad, et al., 2003; Cheng, et al., 2005; Dor and Zhou, 2007; Garg, et al., 2005; Yuan and Huang, 2004) and CN(Kinjo and Nishikawa, 2006; Pollastri, et al., 2002; Yuan, 2005). However, there is no method available for prediction of HSEα and only one (HSEpred) for the prediction of HSEβ(Song, et al., 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function, and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN, and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere, and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability: The method is available at Contact: or
    No preview · Article · Nov 2015 · Bioinformatics
  • Source
    • "The possible mean absolute error, given by the absolute difference between the predicted and experimental values of relative ASA per residue, was 18.0–19.5%, for each measurement (Ahmad, et al., 2003). The value of ASA was the percentage of the solvent-accessible area of each amino acid on the protein. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Availability: The MDD-SOH is now freely available to all interested users at All of the data set used in this work is also available for download in the website. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: TY LEE: (
    Full-text · Article · Sep 2015 · Bioinformatics
  • Source
    • "Typically, an amino acid with more than 20–25% accessible surface area is considered 'solvent accessible' (Zhang et al., 2009). Accessible surface area values were predicted using RVP-net, as provided by the dbPTM database (Ahmad et al., 2003; Lu et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A few small molecule oxidants, most notably hydrogen peroxide, can act as messengers in signal transduction. They trigger so-called 'thiol switches', cysteine residues that are reversibly oxidized to transiently change the functional properties of their host proteins. The proteome-wide identification of functionally relevant 'thiol switches' is of significant interest. Unfortunately, prediction of redox-active cysteine residues on the basis of surface accessibility and other computational parameters appears to be of limited use. Proteomic thiol labeling approaches remain the most reliable strategy to discover new thiol switches in a hypothesis-free manner. We discuss if and how genomic knock-in strategies can help establish the physiological relevance of a 'thiol switch' on the organismal level. We conclude that surprisingly few attempts have been made to thoroughly verify the physiological relevance of thiol-based redox switches in mammalian model organisms.
    Full-text · Article · Feb 2015 · Biological Chemistry
Show more