Mem-PHybrid: Hybrid features-based prediction system for classifying membrane protein types

Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Nilore, Islamabad, Pakistan.
Analytical Biochemistry (Impact Factor: 2.22). 02/2012; 424(1):35-44. DOI: 10.1016/j.ab.2012.02.007
Source: PubMed


Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.

Download full-text


Available from: Asifullah Khan, Apr 02, 2014
  • Source
    • "Later, parameters of ET-KNN are optimized to reduce the output of cost functions. Shen and Chou have utilized the concept of OET-KNN for enhancing the predicted outcomes of membrane protein types and Subcellular localization (Hayat and Khan, 2012b; Shen and Chou, 2005). Further, it is also applied for protein fold pattern recognition (Shen and Chou, 2006). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are the executants of biological functions in living organisms. Comprehension of protein structure is a challenging problem in the era of proteomics, computational biology, and bioinformatics because of its pivotal role in protein folding patterns. Owing to the large exploration of protein sequences in protein databanks and intricacy of protein structures, experimental and theoretical methods are insufficient for prediction of protein structure classes. Therefore, it is highly desirable to develop an accurate, reliable, and high throughput computational model to predict protein structure classes correctly from polygenetic sequences. In this regards, we propose a promising model employing hybrid descriptor space in conjunction with optimized evidence-theoretic K-nearest neighbor algorithm. Hybrid space is the composition of two descriptor spaces including Multi-profile Bayes and bi-gram probability. In order to enhance the generalization power of the classifier, we have selected high discriminative descriptors from the hybrid space using particle swarm optimization, a well-known evolutionary feature selection technique. Performance evaluation of the proposed model is performed using the jackknife test on three low similarity benchmark datasets including 25PDB, 1189, and 640. The success rates of the proposed model are 87.0%, 86.6%, and 88.4%, respectively on the three benchmark datasets. The comparative analysis exhibits that our proposed model has yielded promising results compared to the existing methods in the literature. In addition, our proposed prediction system might be helpful in future research particularly in cases where the major focus of research is on low similarity datasets.
    Full-text · Article · Dec 2013 · Journal of Theoretical Biology
  • Source
    • "These interactions are often useful for the stabilization of protein's 3D structures. Polar amino acids are hydrophilic, whereas non-polar amino acids are hydrophobic, which are used to twist protein into useful shapes (Hayat and Khan 2012). In this study, the TM protein sequences are replicated into five sequences and then each amino acid is replaced with its corresponding property. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at .
    Full-text · Article · Mar 2013 · Amino Acids
  • [Show abstract] [Hide abstract]
    ABSTRACT: Lysine acetylation is a reversible post-translational modification (PTM) which has been linked to many biological and pathological implications. Hence, localization of lysine acetylation is essential for deciphering the mechanism of such implications. Whereas many acetylated lysines in human proteins have been localized through experimental approaches in wet lab, it still fails to reach completion. In the present study, we proposed a novel feature extraction approach, bi-relative adapted binomial score Bayes (BRABSB), combined with support vector machines (SVMs) to construct a human-specific lysine acetylation predictor, which yields, on average, a sensitivity of 83.91%, a specificity of 87.25% and an accuracy of 85.58%, in the case of 5-fold cross validation experiments. Results obtained through the validation on independent data sets show that the proposed approach here outperforms other existing lysine acetylation predictors. Furthermore, due to the fact that global analysis of human lysine acetylproteins, which would ultimately facilitate the systematic investigation of the biological and pathological consequences associated with lysine acetylation events, remains to be resolved, we made an attempt to systematically analyze human lysine acetylproteins, demonstrating their diversity with respect to subcellular localization as well as biological process and predominance by "binding" in terms of molecular function. Our analysis also revealed that human lysine acetylproteins are significantly enriched in neurodegenerative disorders and cancer pathways. Remarkably, lysine acetylproteins in mitochondria are significantly related to neurodegenerative disorders and those in the nucleus are instead significantly involved in pathways in cancers, all of which might ultimately provide novel global insights into such pathological processes for the therapeutic purpose. The web server is deployed at .
    No preview · Article · Aug 2012 · Molecular BioSystems
Show more