Real value prediction of solvent accessibility from amino acid sequence. Proteins

RIKEN Tsukuba Institute, Ibaraki, Japan.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.92). 03/2003; 50(4):629-35. DOI: 10.1002/prot.10328
Source: PubMed

ABSTRACT The solvent accessibility of amino acid residues has been predicted in the past by classifying them into exposure states with varying thresholds. This classification provides a wide range of values for the accessible surface area (ASA) within which a residue may fall. Thus far, no attempt has been made to predict real values of ASA from the sequence information without a priori classification into exposure states. Here, we present a new method with which to predict real value ASAs for residues, based on neighborhood information. Our real value prediction neural network could estimate the ASA for four different nonhomologous, nonredundant data sets of varying size, with 18.0-19.5% mean absolute error, defined as per residue absolute difference between the predicted and experimental values of relative ASA. Correlation between the predicted and experimental values ranged from 0.47 to 0.50. It was observed that the ASA of a residue could be predicted within a 23.7% mean absolute error, even when no information about its neighbors is included. Prediction of real values answers the issue of arbitrary choice of ASA state thresholds, and carries more information than category prediction. Prediction error for each residue type strongly correlates with the variability in its experimental ASA values.

  • Source
    • "Typically, an amino acid with more than 20–25% accessible surface area is considered 'solvent accessible' (Zhang et al., 2009). Accessible surface area values were predicted using RVP-net, as provided by the dbPTM database (Ahmad et al., 2003; Lu et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract A few small molecule oxidants, most notably hydrogen peroxide, can act as messengers in signal transduction. They trigger so-called 'thiol switches', cysteine residues that are reversibly oxidized to transiently change the functional properties of their host proteins. The proteome-wide identification of functionally relevant 'thiol switches' is of significant interest. Unfortunately, prediction of redox-active cysteine residues on the basis of surface accessibility and other computational parameters appears to be of limited use. Proteomic thiol labeling approaches remain the most reliable strategy to discover new thiol switches in a hypothesis-free manner. We discuss if and how genomic knock-in strategies can help establish the physiological relevance of a 'thiol switch' on the organismal level. We conclude that surprisingly few attempts have been made to thoroughly verify the physiological relevance of thiol-based redox switches in mammalian model organisms.
    Biological Chemistry 02/2015; 396(5). DOI:10.1515/hsz-2014-0314 · 2.69 Impact Factor
  • Source
    • "The se- quences are available online at data.tar.gz. The Manesh dataset was widely used by researchers to benchmark prediction methods (Garg et al., 2005; Ahmad et al., 2003; Wang et al., 2004; Xu et al., 2005; Ahmad & Gromiha, 2002; Gianese et al., 2003), and this motivated us to use it to design and validate our method. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a new approach for prediction of protein solvent accessibility is presented. The prediction of relative solvent accessibility gives helpful information for the prediction of native structure of a protein. Recent years several RSA prediction methods including those that generate real values and those that predict discrete states (buried vs. exposed) have been developed. We propose a novel method for real value prediction that aims at minimizing the prediction error when compared with existing methods. The proposed method is based on Pace Regression (PR) predictor. The improved prediction quality is a result of features of PSIBLAST profile and the PR method because pace regression is optimal when the number of coefficients tends to infinity. The experiment results on Manesh dataset show that the proposed method is an improvement in average prediction accuracy and training time.
    EXCLI Journal 01/2010; 8. · 0.73 Impact Factor
  • Source
    • "However , this normalization has been performed in a number of different ways in the biological literature . We propose normalization of SA according to the method of Ahmad et al. (2003) to avoid semantic disparity and ensure the reliability of our database. • Incremental query processing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A huge diversity of biological databases is available via the Internet, but many of these databases have been developed in an ad hoc manner rather than in accordance with any data management principles. In addition, in the area of disordered protein databases, many of the databases have not been made publicly available. This poses challenges to researchers, since reliable protein databases are required in order to test and measure the accuracy of protein structure prediction software. In this paper, we describe our work developing a disordered protein database using data from the protein secondary structure database DSSP-cont. In particular, we discuss the way in which we have addressed the issues of data cleaning, query processing and interoperability. This research is a pilot study in managing biological data.
    Database Technologies 2007. Proceedings of the Eighteenth Australasian Database Conference, ADC 2007, Ballarat, Victoria, Australia, January 29 - February 2, 2007, Proceedings; 01/2007
Show more

Similar Publications