Transmembrane helix prediction in proteins using hydrophobicity properties and higher-order statistics.
ABSTRACT Prediction of the transmembrane (TM) helices is important in the study of membrane proteins. A novel method to predict the location and length of both single and multiple TM helices in human proteins is presented. The proposed method is based on a combination of hydrophobicity and higher-order statistics, resulting in a TM prediction tool, namely K(4)HTM. A training dataset of 117 human single TM proteins and two test-datasets containing 499 and 484 human single and multiple TM proteins, respectively, were drawn from the SWISS-PROT public database and used for the optimisation and evaluation of K(4)HTM. Validation results showed that K(4)HTM correctly predicts the entire topology for 99.68% and 93.08% of the sequences in the single and multiple test-datasets, respectively. These results compare favourably with existing methods, such as SPLIT4, TMHMM2, WAVETM and SOSUI, constituting an alternative approach to the TM helix prediction problem.
[show abstract] [hide abstract]
ABSTRACT: Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.Current Protein and Peptide Science 11/2010; 11(7):550-61. · 2.89 Impact Factor