Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition

Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
Journal of Theoretical Biology (Impact Factor: 2.12). 03/2012; 304:88-95. DOI: 10.1016/j.jtbi.2012.03.017
Source: PubMed


Mycobacterium tuberculosis (MTB) is a pathogenic bacterial species in the genus Mycobacterium and the causative agent of most cases of tuberculosis (Berman et al., 2000). Knowledge of the localization of Mycobacterial protein may help unravel the normal function of this protein. Automated prediction of Mycobacterial protein subcellular localization is an important tool for genome annotation and drug discovery. In this work, a benchmark data set with 638 non-redundant mycobacterial proteins is constructed and an approach for predicting Mycobacterium subcellular localization is proposed by combining amino acid composition, dipeptide composition, reduced physicochemical property, evolutionary information, pseudo-average chemical shift. The overall prediction accuracy is 87.77% for Mycobacterial subcellular localizations and 85.03% for three membrane protein types in Integral membranes using the algorithm of increment of diversity combined with support vector machine. The performance of pseudo-average chemical shift is excellent. In order to check the performance of our method, the data set constructed by Rashid was also predicted and the accuracy of 98.12% was obtained. This indicates that our approach was better than other existing methods in literature.

Download full-text


Available from: Guoliang Fan,
  • Source
    • "More information for prediction of subcellular localization was shown in two comprehensive review papers (Chou and Shen, 2007; Nakai, 2000). Moreover, many new algorithms were established for identifying the subcellular localization in recent years (Chou and Shen, 2010a, 2010b; Chou et al., 2011, 2012; Fan and Li, 2012; Mei, 2012; Wan et al., 2013; Wu et al., 2011, 2012; Xiao et al., 2011a, 2011b). Although various experimental techniques and computational approaches have been developed and used to identify subcellular localizations of proteins (Chou and Shen, 2007; Dreger, 2003; Gygi et al., 1999; Nakai, 2000; Tsien, 1998), however, to date, only a few attempts have been made to globally analyze proteins in different subcellular localizations (Drawid et al., 2000; Ghaemmaghami et al., 2003; Martin and MacNeill, 2004; Wang et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell’s survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
    Journal of Theoretical Biology 10/2014; 358:61–73. DOI:10.1016/j.jtbi.2014.05.008 · 2.12 Impact Factor
  • Source
    • "Since the introduction of the protein subcellular localization prediction over two decades ago, a wide range of pattern recognition-based approaches have been proposed to solve this problem [7] [8] [9]. The performance of a pattern recognition technique to address protein subcellular localization prediction problem depends on the classification technique as well as features being used [10] [11] [12] [13] [14] [15]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein subcellular localization is defined as predicting the functioning location of a given protein in the cell. It is considered an important step towards protein function prediction and drug design. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve protein subcellular localization prediction performance. However, relying solely on GO, this problem remains unsolved. At the same time, the impact of other sources of features especially evolutionary-based features has not been explored adequately for this task. In this study, we aim to extract discriminative evolutionary features to tackle this problem. To do this, we propose two segmentation based feature extraction methods to explore potential local evolutionary-based information for Gram-positive and Gram-negative subcellular localizations. We will show that by applying a Support Vector Machine (SVM) classifier to our extracted features, we are able to enhance Gram-positive and Gram-negative subcellular localization prediction accuracies by up to 6.4% better than previous studies including the studies that used GO for feature extraction.
    Journal of Theoretical Biology 09/2014; 364. DOI:10.1016/j.jtbi.2014.09.029 · 2.12 Impact Factor
  • Source
    • "In developing a statistical method for predicting the cleavage sites (Chou, 1993) in proteins or their attributes (Chou, 1995), one of the important procedures was to formulate the protein or peptide samples with an effective mathematical expression that could truly reflect the intrinsic correlation with the desired target. To realize this, various different vectors were proposed (see, Cao, Xu & Liang, 2013; Chen & Li, 2013; Du et al., 2012; Esmaeili, Mohabatkar & Mohsenzadeh, 2010; Fan & Li, 2012; Khosravian et al., 2013; Liu et al., 2012; Mohabatkar et al., 2013; Mohabatkar, Mohammad Beigi & Esmaeili, 2011; Nanni et al., 2010; Wan, Mak & Kung, 2013; Yu et al., 2010; Zhang et al., 2008a; Zhou et al., 2007) to formulate proteins or peptides by extracting their different features into the pseudo amino acid composition (Chou, 2001a) or Chou's PseAAC (Lin & Lapointe, 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: As one of the most important and universal posttranslational modifications (PTMs) of proteins, S-nitrosylation (SNO) plays crucial roles in a variety of biological processes, including the regulation of cellular dynamics and many signaling events. Knowledge of SNO sites in proteins is very useful for drug development and basic research as well. Unfortunately, it is both time-consuming and costly to determine the SNO sites purely based on biological experiments. Facing the explosive protein sequence data generated in the post-genomic era, we are challenged to develop automated vehicles for timely and effectively determining the SNO sites for uncharacterized proteins. To address the challenge, a new predictor called iSNO-AAPair was developed by taking into account the coupling effects for all the pairs formed by the nearest residues and the pairs by the next nearest residues along protein chains. The cross-validation results on a state-of-the-art benchmark have shown that the new predictor outperformed the existing predictors. The same was true when tested by the independent proteins whose experimental SNO sites were known. A user-friendly web-server for iSNO-AAPair was established at, by which users can easily obtain their desired results without the need to follow the mathematical equations involved during its development.
    PeerJ 10/2013; 1(article e171):e171. DOI:10.7717/peerj.171 · 2.11 Impact Factor
Show more