Article

Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies

College of Automation, Northwestern Polytechnical University, No. 127 Youyi West Road, Xi'an 710072, China.
Amino Acids (Impact Factor: 3.65). 06/2008; 34(4):565-72. DOI: 10.1007/s00726-007-0010-9
Source: PubMed

ABSTRACT The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou's pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.

0 Followers
 · 
74 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
    Journal of Theoretical Biology 09/2014; 356:213–222. DOI:10.1016/j.jtbi.2014.04.040 · 2.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In protein fold recognition, a protein is classified into one of its folds. The recognition of a protein fold can be done by employing feature extraction methods to extract relevant information from protein sequences and then by using a classifier to accurately recognize novel protein sequences. In the past, several feature extraction methods have been developed but with limited recognition accuracy only. Protein sequences of varying lengths share the same fold and therefore they are very similar (in a fold) if aligned properly. To this, we develop an amino acid alignment method to extract important features from protein sequences by computing dissimilarity distances between proteins. This is done by measuring distance between two respective position specific scoring matrices of protein sequences which is used in a support vector machine framework. We demonstrated the effectiveness of the proposed method on several benchmark datasets. The method shows significant improvement in the fold recognition performance which is in the range of 4.3% to 7.6% compared to several other existing feature extraction methods.
    Journal of Theoretical Biology 03/2014; 354. DOI:10.1016/j.jtbi.2014.03.033 · 2.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In protein fold recognition, a protein is classified into one of its folds. The recognition of a protein fold can be done by employing feature extraction methods to extract relevant information from protein sequences and then by using a classifier to accurately recognize novel protein sequences. In the past, several feature extraction methods have been developed but with limited recognition accuracy only. Protein sequences of varying lengths share the same fold and therefore they are very similar (in a fold) if aligned properly. To this, we develop an amino acid alignment method to extract important features from protein sequences by computing dissimilarity distances between proteins. This is done by measuring distance between two respective position specific scoring matrices of protein sequences which is used in a support vector machine framework. We demonstrated the effectiveness of the proposed method on several benchmark datasets. The method shows significant improvement in the fold recognition performance which is in the range of 4.3–7.6% compared to several other existing feature extraction methods.
    Journal of Theoretical Biology 01/2014; 354:137–145. · 2.30 Impact Factor