Transmembrane helix prediction in proteins using hydrophobicity properties and higher-order statistics.
ABSTRACT Prediction of the transmembrane (TM) helices is important in the study of membrane proteins. A novel method to predict the location and length of both single and multiple TM helices in human proteins is presented. The proposed method is based on a combination of hydrophobicity and higher-order statistics, resulting in a TM prediction tool, namely K(4)HTM. A training dataset of 117 human single TM proteins and two test-datasets containing 499 and 484 human single and multiple TM proteins, respectively, were drawn from the SWISS-PROT public database and used for the optimisation and evaluation of K(4)HTM. Validation results showed that K(4)HTM correctly predicts the entire topology for 99.68% and 93.08% of the sequences in the single and multiple test-datasets, respectively. These results compare favourably with existing methods, such as SPLIT4, TMHMM2, WAVETM and SOSUI, constituting an alternative approach to the TM helix prediction problem.
- SourceAvailable from: Gabor E Tusnady[Show abstract] [Hide abstract]
ABSTRACT: Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.Current Protein and Peptide Science 11/2010; 11(7):550-61. · 2.33 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: This paper is in the area of membrane proteins. Membrane proteins make up about 75% of possible targets for novel drugs discovery. However, membrane proteins are one of the most understudied groups of proteins in biochemical research because of technical difficulties of attaining structural information about transmembrane regions or domains. Structural determination of TM regions is an important priority in pharmaceutical industry, as it paves the way for structure based drug design.This research presents a novel evolutionary support vector machine (SVM) based alpha-helix transmembrane region prediction algorithm to solve the membrane helices in amino acid sequences. The SVM-genetic algorithm (GA) methodology is based on the optimisation of sliding window size, evolutionary encoding selection and SVM parameter optimisation. In this research average hydrophobicity and propensity based on skew statistics are used to encode the one letter representation of amino acid sequences datasets.The computer simulation results demonstrate that the proposed SVM-GA methodology performs better than most conventional techniques producing an accuracy of 86.71% for cross-validation and 86.43% for jack-knife for randomly selected proteins containing single and multiple transmembrane regions. Furthermore, for the amino acid sequence 3LVG, the proposed SVM-GA produces better alpha-helix region identification than PRED-TMR2, MEMSATSVM/MEMSAT3 and PSIPRED V3.0.Expert Systems with Applications 07/2013; 40(9):3412-3420. · 1.97 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Transmembrane proteins are some special and important proteins in cells. Because of their importance and specificity, the prediction of the transmembrane regions has very important theoretical and practical significance. At present, the prediction methods are mainly based on the physicochemical property and statistic analysis of amino acids. However, these methods are suitable for some environments but inapplicable for other environments. In this paper, the multi-sources information fusion theory has been introduced to predict the transmembrane regions. The proposed method is test on a data set of transmembrane proteins. The results show that the proposed method has the ability of predicting the transmembrane regions as a good performance and powerful tool.Journal of Electronics (China) 03/2012; 29(1-2).