Mem-PHybrid: Hybrid features-based prediction system for classifying membrane protein types

Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Nilore, Islamabad, Pakistan.
Analytical Biochemistry (Impact Factor: 2.31). 02/2012; 424(1):35-44. DOI: 10.1016/j.ab.2012.02.007
Source: PubMed

ABSTRACT Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.

Download full-text


Available from: Asifullah Khan, Apr 02, 2014
1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are the executants of biological functions in living organisms. Comprehension of protein structure is a challenging problem in the era of proteomics, computational biology, and bioinformatics because of its pivotal role in protein folding patterns. Owing to the large exploration of protein sequences in protein databanks and intricacy of protein structures, experimental and theoretical methods are insufficient for prediction of protein structure classes. Therefore, it is highly desirable to develop an accurate, reliable, and high throughput computational model to predict protein structure classes correctly from polygenetic sequences. In this regards, we propose a promising model employing hybrid descriptor space in conjunction with optimized evidence-theoretic K-nearest neighbor algorithm. Hybrid space is the composition of two descriptor spaces including Multi-profile Bayes and bi-gram probability. In order to enhance the generalization power of the classifier, we have selected high discriminative descriptors from the hybrid space using particle swarm optimization, a well-known evolutionary feature selection technique. Performance evaluation of the proposed model is performed using the jackknife test on three low similarity benchmark datasets including 25PDB, 1189, and 640. The success rates of the proposed model are 87.0%, 86.6%, and 88.4%, respectively on the three benchmark datasets. The comparative analysis exhibits that our proposed model has yielded promising results compared to the existing methods in the literature. In addition, our proposed prediction system might be helpful in future research particularly in cases where the major focus of research is on low similarity datasets.
    Journal of Theoretical Biology 12/2013; 346. DOI:10.1016/j.jtbi.2013.12.015 · 2.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at .
    Amino Acids 03/2013; DOI:10.1007/s00726-013-1466-4 · 3.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets' lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process. Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (-0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area. Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.
    BMC Bioinformatics 04/2015; 16(1). DOI:10.1186/s12859-015-0576-2 · 2.67 Impact Factor