Using the concept of Chou's pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies.

College of Automation, Northwestern Polytechnical University, No. 127 Youyi West Road, Xi'an 710072, China.
Amino Acids (Impact Factor: 3.65). 06/2008; 34(4):565-72. DOI: 10.1007/s00726-007-0010-9
Source: PubMed

ABSTRACT The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou's pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This paper proposes an efficient multi-label predictor (namely mPLR-Loc) based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the gene ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive-decision based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online at Copyright © 2014 Elsevier Inc. All rights reserved.
    Analytical Biochemistry 10/2014; 473. DOI:10.1016/j.ab.2014.10.014 · 2.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent dataset test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.
    Journal of biomolecular Structure & Dynamics 09/2014; DOI:10.1080/07391102.2014.968624 · 2.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named "lysine ubiquitination" because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called "iUbiq-Lys" was developed based on the evolutionary information, grey system model, as well as the general form of pseudo amino acid composition. It was demonstrated via the rigorous cross validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at . For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.
    Journal of biomolecular Structure & Dynamics 09/2014; DOI:10.1080/07391102.2014.968875 · 2.98 Impact Factor