Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition

Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
Journal of Theoretical Biology (Impact Factor: 2.12). 03/2012; 304:88-95. DOI: 10.1016/j.jtbi.2012.03.017
Source: PubMed


Mycobacterium tuberculosis (MTB) is a pathogenic bacterial species in the genus Mycobacterium and the causative agent of most cases of tuberculosis (Berman et al., 2000). Knowledge of the localization of Mycobacterial protein may help unravel the normal function of this protein. Automated prediction of Mycobacterial protein subcellular localization is an important tool for genome annotation and drug discovery. In this work, a benchmark data set with 638 non-redundant mycobacterial proteins is constructed and an approach for predicting Mycobacterium subcellular localization is proposed by combining amino acid composition, dipeptide composition, reduced physicochemical property, evolutionary information, pseudo-average chemical shift. The overall prediction accuracy is 87.77% for Mycobacterial subcellular localizations and 85.03% for three membrane protein types in Integral membranes using the algorithm of increment of diversity combined with support vector machine. The performance of pseudo-average chemical shift is excellent. In order to check the performance of our method, the data set constructed by Rashid was also predicted and the accuracy of 98.12% was obtained. This indicates that our approach was better than other existing methods in literature.

Download full-text


Available from: Guoliang Fan
  • Source
    • "The online web server PSIPRED ( was used to obtain the predicted secondary structure of each protein sequence in the 76-protein fold class dataset. We calculated the ACS using a previously described method (Mielke and Krishnan, 2003; Fan and Li, 2012a; Fan and Li, 2012b; Fan et al., 2013; Fan and Li, 2013; Anaika et al., 2003). We selected chemical shift values of 1 H a and 1 H N (two types of protein backbone atoms for every amino acid residue of protein sequence P) to calculate the corresponding ACS. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The recognition of protein folds is an important step in the prediction of protein structure and function. Recently, an increasing number of researchers have sought to improve the methods for protein fold recognition. Following the construction of a dataset consisting of 27 protein fold classes by Ding and Dubchak in 2001, prediction algorithms, parameters and the construction of new datasets have improved for the prediction of protein folds. In this study, we reorganized a dataset consisting of 76-fold classes constructed by Liu et al. and used the values of the increment of diversity, average chemical shifts of secondary structure elements and secondary structure motifs as feature parameters in the recognition of multi-class protein folds. With the combined feature vector as the input parameter for the Random Forests algorithm and ensemble classification strategy, we propose a novel method to identify the 76 protein fold classes. The overall accuracy of the test dataset using an independent test was 66.69%; when the training and test sets were combined, with 5-fold cross-validation, the overall accuracy was 73.43%. This method was further used to predict the test dataset and the corresponding structural classification of the first 27-protein fold class dataset, resulting in overall accuracies of 79.66% and 93.40%, respectively. Moreover, when the training set and test sets were combined, the accuracy using 5-fold cross-validation was 81.21%. Additionally, this approach resulted in improved prediction results using the 27-protein fold class dataset constructed by Ding and Dubchak.
    Full-text · Article · Dec 2015 · Saudi Journal of Biological Sciences
    • "Thus it is particularly useful for the analysis of a large amount of complicated protein sequences by means of the taxonomic approach. This methodology has been widely used to study various protein attributes, such as protein structural class ([2] [3], [81], [45], [22]) protein subcellular localization ([19], [14] [15], [74]), protein subnuclear localization ([70] [71], [52]) protein submitochondrial localization ([25]) protein oligomer type ([10]) conotoxin super-family classification ([49], [46]) membrane protein type ([47], [70], [71], [80], [73], [16], [17], [7]) apo ptosis protein subcellular localization ([4], [5], [58]) enzyme functional classification ([9], [11], [87], [75]) proteinfold pattern ([72]) and signal peptide ([18], [76]) predict mycobacterial proteins subcellular locations ([29]). Some more recent work based on this principle concern ([59], [58], [26], [83], [68]). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics is a relatively new discipline where Mathematics are applied in the analysis of genetic sequences. The analysis of the genetic material of living organisms which consist of nucleic acids DNA and RNA is of great importance for diagnosis and taxonomy reasons. In the present paper we propose a new methodology for the representation of genetic sequences as fuzzy sets in the I 12 space which can significantly improve the results of Sadegh-Zadeh and Torres & Nieto. An important characteristic of our proposed methodology is that the location of Amino acids along the genetic sequences play an important role thus extending in a significant way the computational efficiency advantage of genetic sequence representation. We present some characteristic examples using the new proposed methodology where we calculate the distance and similarity degree of given polynucleotides.
    No preview · Article · Aug 2015 · Journal of Intelligent and Fuzzy Systems
  • Source
    • "More information for prediction of subcellular localization was shown in two comprehensive review papers (Chou and Shen, 2007; Nakai, 2000). Moreover, many new algorithms were established for identifying the subcellular localization in recent years (Chou and Shen, 2010a, 2010b; Chou et al., 2011, 2012; Fan and Li, 2012; Mei, 2012; Wan et al., 2013; Wu et al., 2011, 2012; Xiao et al., 2011a, 2011b). Although various experimental techniques and computational approaches have been developed and used to identify subcellular localizations of proteins (Chou and Shen, 2007; Dreger, 2003; Gygi et al., 1999; Nakai, 2000; Tsien, 1998), however, to date, only a few attempts have been made to globally analyze proteins in different subcellular localizations (Drawid et al., 2000; Ghaemmaghami et al., 2003; Martin and MacNeill, 2004; Wang et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell’s survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
    Full-text · Article · Oct 2014 · Journal of Theoretical Biology
Show more