Real value prediction of solvent accessibility from amino acid sequence. Proteins

RIKEN Tsukuba Institute, Ibaraki, Japan.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.63). 03/2003; 50(4):629-35. DOI: 10.1002/prot.10328
Source: PubMed


The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.

10 Reads
  • Source
    • "The possible mean absolute error, given by the absolute difference between the predicted and experimental values of relative ASA per residue, was 18.0–19.5%, for each measurement (Ahmad, et al., 2003). The value of ASA was the percentage of the solvent-accessible area of each amino acid on the protein. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Availability: The MDD-SOH is now freely available to all interested users at All of the data set used in this work is also available for download in the website. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: TY LEE: (
    Bioinformatics 09/2015; DOI:10.1093/bioinformatics/btv558 · 4.98 Impact Factor
  • Source
    • "Typically, an amino acid with more than 20–25% accessible surface area is considered 'solvent accessible' (Zhang et al., 2009). Accessible surface area values were predicted using RVP-net, as provided by the dbPTM database (Ahmad et al., 2003; Lu et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A few small molecule oxidants, most notably hydrogen peroxide, can act as messengers in signal transduction. They trigger so-called 'thiol switches', cysteine residues that are reversibly oxidized to transiently change the functional properties of their host proteins. The proteome-wide identification of functionally relevant 'thiol switches' is of significant interest. Unfortunately, prediction of redox-active cysteine residues on the basis of surface accessibility and other computational parameters appears to be of limited use. Proteomic thiol labeling approaches remain the most reliable strategy to discover new thiol switches in a hypothesis-free manner. We discuss if and how genomic knock-in strategies can help establish the physiological relevance of a 'thiol switch' on the organismal level. We conclude that surprisingly few attempts have been made to thoroughly verify the physiological relevance of thiol-based redox switches in mammalian model organisms.
    Biological Chemistry 02/2015; 396(5). DOI:10.1515/hsz-2014-0314 · 3.27 Impact Factor
  • Source
    • "B. Input Features We gathered the most comprehensive and independent set of residue level input features, which is capable of capturing the sequence information, evolutionary information as well as the structural information. The residue level information includes: (a) single valued amino acid type (all the necessary information for the correct folding of a protein is encoded in its amino acid sequence [26]); (b) seven physicochemical properties of amino acid (different types, short or long, disordered regions in protein are found to have distinguished physicochemical properties); (c) twenty PSSM's (position specific scoring matrix) indicating the evolutionary information accumulated in each residue position of a protein sequence; (d) three predicted secondary structure (helix, strand and coil) probabilities from SPINE-X [27], one predicted accessible surface area (ASA) normalized by the ASA of an extended conformation (Ala-X- Ala) [28] and two predicted backbone torsion angle (phi, psi) fluctuations [29] since disordered residues are characterized by lack of stable secondary structure [30], highly exposed area and angle fluctuations; (e) one monogram and twenty bigrams computed from PSSM [31] representing the conserved evolutionary information of PSSM transformed from primary structure level to three dimensional structure level, which are normalized by the median of normal density distribution of monogram and bigram values in their logarithmic space; (f) one indicator for terminal residues (five residues from "
    [Show abstract] [Hide abstract]
    ABSTRACT: Intrinsically disorder regions (IDRs) or, proteins (IDPs) are associated with important biological functions, while lacking stable structure in their native state. The phenomena of disordered proteins or residues are abundant in nature and are extensively involved in critical human diseases and hence impacting drug discovery. Thus, the study using disorder prediction is becoming crucial in the proteomic research. The large scale growth of genome database demands high performance computational methods for identification of protein disorder. We developed a canonical support vector machine based disorder predictor, DisPredict by integrating RBF kernel. It employs novel feature set for accurate characterization of disorder which outperformed two leading predictors: the neural network based SPINE-D and Meta predictor MFDp based on tenfold cross validation. We propose a post processing of probabilities to further improve the accuracy, named DisPredict1.1 which yields outstanding performance further both in binary annotation and real valued probability prediction per residue in both short and long disordered regions. It provides highest Mathews Correlation Coefficient (MCC), competitive Area Under receiver operating characteristic Curve (AUC) and lowest Mean Absolute Error (MAE) when compared with twenty existing predictors of several kinds on independent benchmark dataset. DisPredict is available online.
    17th International Conference on Computer and Information Technology (ICCIT), 2014; 12/2014
Show more