A Highly Accurate Statistical Approach for the Prediction of Transmembrane {beta}-Barrels

Department of Biochemistry, Tulane University Health Sciences Center, New Orleans, LA 70112, USA.
Bioinformatics (Impact Factor: 4.98). 08/2010; 26(16):1965-74. DOI: 10.1093/bioinformatics/btq308
Source: PubMed


Transmembrane beta-barrels (TMBBs) belong to a special structural class of proteins predominately found in the outer membranes of Gram-negative bacteria, mitochondria and chloroplasts. TMBBs are surface-exposed proteins that perform a variety of functions ranging from nutrient acquisition to osmotic regulation. These properties suggest that TMBBs have great potential for use in vaccine or drug therapy development. However, membrane proteins, such as TMBBs, are notoriously difficult to identify and characterize using traditional experimental approaches and current prediction methods are still unreliable.
A prediction method based on the physicochemical properties of experimentally characterized TMBB structures was developed to predict TMBB-encoding genes from genomic databases. The Freeman-Wimley prediction algorithm developed in this study has an accuracy of 99% and MCC of 0.748 when using the most efficient prediction criteria, which is better than any previously published algorithm.
The MS Windows-compatible application is available for download at

Full-text preview

Available from:
  • Source
    • "FASGAIACC [14] and SVM_PAAC [16], can predict OMPs using pseudo amino acid compositions and biochemical properties, which reported high prediction accuracy in existing methods. Several methods claimed that they have excellent performance for OMP prediction using structural information, including secondary structures [18] and topology structures [19] [20] [21] [22] [23]. Structure-based methods are usually more accurate than sequence-based methods. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Outer membrane proteins (OMPs) play important roles in bacterial cellular processes. Discriminating OMPs from different fold types of proteins is helpful for successful prediction of their structures and for exact designs of OMP-targeted drugs. In this paper, we developed a novel prediction method based on primary sequence features and support vector machine (SVM) algorithms. For protein sequences, discriminative features were extracted by the combination of sequence encoding based on grouped weights (EBGW), amino acid compositions and biochemical properties. Feature subsets were screened using F-score algorithm for training a SVM-based classifier, namely EBGW_OMP. The performance of EBGW_OMP was examined on a benchmark dataset of 1087 proteins. The results show that EBGW_OMP can discriminate OMPs from globular proteins, α-helical membrane proteins or non-OMPs with cross-validated accuracy of 98.0%, 97.6% or 97.9%, respectively, which outperformed existing sequence-based methods. EBGW_OMP also successfully distinguished 681 out of 722 OMPs with 97.0% accuracy in another benchmark dataset of 2657 proteins. Genome-wide tests show that EBGW_OMP has excellent capability of correctly detecting OMPs and is considerable for genomic OMPs prediction. The web server implements EBGW_OMP is freely accessible at OMP.
    Full-text · Conference Paper · May 2014
  • Source
    • "For example, the Freeman-Wimley prediction algorithm was developed to improve the prediction of transmembrane β-barrel proteins over previous algorithms to an accuracy of 99% and MCC score of 0.75 [57]. Freeman and Wimley [57] demonstrated that their prediction algorithm was more accurate than BOMP and TMBETADISC-RBF, two of the methods used in the present study. This method could potentially be incorporated into our transmembrane β-barrel protein predictor group to further improve the prediction performance of our prediction method if it was available as an online user-friendly tool for genome-scale prediction. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Outer membrane proteins (OMPs) of Pasteurella multocida have various functions related to virulence and pathogenesis and represent important targets for vaccine development. Various bioinformatic algorithms can predict outer membrane localization and discriminate OMPs by structure or function. The designation of a confident prediction framework by integrating different predictors followed by consensus prediction, results integration and manual confirmation will improve the prediction of the outer membrane proteome. In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: those of avian strain Pm70 and porcine non-toxigenic strain 3480. Predicted proteins in each group were filtered by optimized criteria for consensus prediction: at least two positive predictions for the subcellular localization predictors, three for the transmembrane β-barrel protein predictors and one for the lipoprotein predictors. The consensus predicted proteins were integrated from each group into a single list of proteins. We further incorporated a manual confirmation step including a public database search against PubMed and sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interactions to enhance the confidence of prediction. As a result, we were able to confidently predict 98 putative OMPs from the avian strain genome and 107 OMPs from the porcine strain genome with 83% overlap between the two genomes. The bioinformatic framework developed in this study has increased the number of putative OMPs identified in P. multocida and allowed these OMPs to be identified with a higher degree of confidence. Our approach can be applied to investigate the outer membrane proteomes of other Gram-negative bacteria.
    Full-text · Article · Apr 2012 · BMC Bioinformatics
  • Source
    • "This is mainly due to experimental difficulties and complexity of the TMB structure [6]. Consequently, various learning-based techniques have been developed for discriminating TMB proteins from globular and transmembrane α-helical proteins [6-8], and for predicting TMB secondary structures [7-12]. We first discuss these methods and their potential shortcomings in detail, and then proceed with describing our approach. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transmembrane β-barrel proteins are a special class of transmembrane proteins which play several key roles in human body and diseases. Due to experimental difficulties, the number of transmembrane β-barrel proteins with known structures is very small. Over the years, a number of learning-based methods have been introduced for recognition and structure prediction of transmembrane β-barrel proteins. Most of these methods emphasize on homology search rather than any biological or chemical basis. We present a novel graph-theoretic model for classification and structure prediction of transmembrane β-barrel proteins. This model folds proteins based on energy minimization rather than a homology search, avoiding any assumption on availability of training dataset. The ab initio model presented in this paper is the first method to allow for permutations in the structure of transmembrane proteins and provides more structural information than any known algorithm. The model is also able to recognize β-barrels by assessing the pseudo free energy. We assess the structure prediction on 41 proteins gathered from existing databases on experimentally validated transmembrane β-barrel proteins. We show that our approach is quite accurate with over 90% F-score on strands and over 74% F-score on residues. The results are comparable to other algorithms suggesting that our pseudo-energy model is close to the actual physical model. We test our classification approach and show that it is able to reject α-helical bundles with 100% accuracy and β-barrel lipocalins with 97% accuracy. We show that it is possible to design models for classification and structure prediction for transmembrane β-barrel proteins which do not depend essentially on training sets but on combinatorial properties of the structures to be proved. These models are fairly accurate, robust and can be run very efficiently on PC-like computers. Such models are useful for the genome screening.
    Full-text · Article · Apr 2012 · BMC Genomics
Show more