Transmembrane helix prediction in proteins using hydrophobicity properties and higher-order statistics.

Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece.
Computers in Biology and Medicine (Impact Factor: 1.48). 07/2008; 38(8):867-80. DOI: 10.1016/j.compbiomed.2008.05.003
Source: PubMed

ABSTRACT Prediction of the transmembrane (TM) helices is important in the study of membrane proteins. A novel method to predict the location and length of both single and multiple TM helices in human proteins is presented. The proposed method is based on a combination of hydrophobicity and higher-order statistics, resulting in a TM prediction tool, namely K(4)HTM. A training dataset of 117 human single TM proteins and two test-datasets containing 499 and 484 human single and multiple TM proteins, respectively, were drawn from the SWISS-PROT public database and used for the optimisation and evaluation of K(4)HTM. Validation results showed that K(4)HTM correctly predicts the entire topology for 99.68% and 93.08% of the sequences in the single and multiple test-datasets, respectively. These results compare favourably with existing methods, such as SPLIT4, TMHMM2, WAVETM and SOSUI, constituting an alternative approach to the TM helix prediction problem.

  • [Show abstract] [Hide abstract]
    ABSTRACT: The identification of transmembrane segments in protein sequences is an important issue in the field of bioinformatics. In this study a method is proposed for the discrimination between proteins with single and multiple transmembrane segments, combining chemical and statistical features of the proteins with higher-order statistics and morphological analysis for protein categorisation. The method was tested on human proteins, extracted from public available databases and the results have shown an efficiency of the proposed algorithm to correctly classify the sequences under study into two classes, for a wide range of transmembrane segment lengths. This paves the way for a more efficient analysis of transmembrane proteins taking into account the individual features and patterns occurring within proteins with single and multiple transmembrane segments.
    Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 02/2008; 2008:1351-4.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is in the area of membrane proteins. Membrane proteins make up about 75% of possible targets for novel drugs discovery. However, membrane proteins are one of the most understudied groups of proteins in biochemical research because of technical difficulties of attaining structural information about transmembrane regions or domains. Structural determination of TM regions is an important priority in pharmaceutical industry, as it paves the way for structure based drug design.This research presents a novel evolutionary support vector machine (SVM) based alpha-helix transmembrane region prediction algorithm to solve the membrane helices in amino acid sequences. The SVM-genetic algorithm (GA) methodology is based on the optimisation of sliding window size, evolutionary encoding selection and SVM parameter optimisation. In this research average hydrophobicity and propensity based on skew statistics are used to encode the one letter representation of amino acid sequences datasets.The computer simulation results demonstrate that the proposed SVM-GA methodology performs better than most conventional techniques producing an accuracy of 86.71% for cross-validation and 86.43% for jack-knife for randomly selected proteins containing single and multiple transmembrane regions. Furthermore, for the amino acid sequence 3LVG, the proposed SVM-GA produces better alpha-helix region identification than PRED-TMR2, MEMSATSVM/MEMSAT3 and PSIPRED V3.0.
    Expert Systems with Applications 01/2013; 40(9):3412-3420. · 1.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.
    Current Protein and Peptide Science 11/2010; 11(7):550-61. · 2.33 Impact Factor