Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types.
ABSTRACT Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.188.8.131.52/Mem-PHybrid.
- [show abstract] [hide abstract]
ABSTRACT: Lysine acetylation is a reversible post-translational modification (PTM) which has been linked to many biological and pathological implications. Hence, localization of lysine acetylation is essential for deciphering the mechanism of such implications. Whereas many acetylated lysines in human proteins have been localized through experimental approaches in wet lab, it still fails to reach completion. In the present study, we proposed a novel feature extraction approach, bi-relative adapted binomial score Bayes (BRABSB), combined with support vector machines (SVMs) to construct a human-specific lysine acetylation predictor, which yields, on average, a sensitivity of 83.91%, a specificity of 87.25% and an accuracy of 85.58%, in the case of 5-fold cross validation experiments. Results obtained through the validation on independent data sets show that the proposed approach here outperforms other existing lysine acetylation predictors. Furthermore, due to the fact that global analysis of human lysine acetylproteins, which would ultimately facilitate the systematic investigation of the biological and pathological consequences associated with lysine acetylation events, remains to be resolved, we made an attempt to systematically analyze human lysine acetylproteins, demonstrating their diversity with respect to subcellular localization as well as biological process and predominance by "binding" in terms of molecular function. Our analysis also revealed that human lysine acetylproteins are significantly enriched in neurodegenerative disorders and cancer pathways. Remarkably, lysine acetylproteins in mitochondria are significantly related to neurodegenerative disorders and those in the nucleus are instead significantly involved in pathways in cancers, all of which might ultimately provide novel global insights into such pathological processes for the therapeutic purpose. The web server is deployed at .Molecular BioSystems 08/2012; 8(11):2964-73. · 3.35 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://184.108.40.206/WRF-TMH/ .Amino Acids 03/2013; · 3.91 Impact Factor