Transmembrane helix prediction in proteins using hydrophobicity properties and higher-order statistics.
ABSTRACT Prediction of the transmembrane (TM) helices is important in the study of membrane proteins. A novel method to predict the location and length of both single and multiple TM helices in human proteins is presented. The proposed method is based on a combination of hydrophobicity and higher-order statistics, resulting in a TM prediction tool, namely K(4)HTM. A training dataset of 117 human single TM proteins and two test-datasets containing 499 and 484 human single and multiple TM proteins, respectively, were drawn from the SWISS-PROT public database and used for the optimisation and evaluation of K(4)HTM. Validation results showed that K(4)HTM correctly predicts the entire topology for 99.68% and 93.08% of the sequences in the single and multiple test-datasets, respectively. These results compare favourably with existing methods, such as SPLIT4, TMHMM2, WAVETM and SOSUI, constituting an alternative approach to the TM helix prediction problem.
- SourceAvailable from: 220.127.116.11[show abstract] [hide abstract]
ABSTRACT: A computer program that progressively evaluates the hydrophilicity and hydrophobicity of a protein along its amino acid sequence has been devised. For this purpose, a hydropathy scale has been composed wherein the hydrophilic and hydrophobic properties of each of the 20 amino acid side-chains is taken into consideration. The scale is based on an amalgam of experimental observations derived from the literature. The program uses a moving-segment approach that continuously determines the average hydropathy within a segment of predetermined length as it advances through the sequence. The consecutive scores are plotted from the amino to the carboxy terminus. At the same time, a midpoint line is printed that corresponds to the grand average of the hydropathy of the amino acid compositions found in most of the sequenced proteins. In the case of soluble, globular proteins there is a remarkable correspondence between the interior portions of their sequence and the regions appearing on the hydrophobic side of the midpoint line, as well as the exterior portions and the regions on the hydrophilic side. The correlation was demonstrated by comparisons between the plotted values and known structures determined by crystallography. In the case of membrane-bound proteins, the portions of their sequences that are located within the lipid bilayer are also clearly delineated by large uninterrupted areas on the hydrophobic side of the midpoint line. As such, the membrane-spanning segments of these proteins can be identified by this procedure. Although the method is not unique and embodies principles that have long been appreciated, its simplicity and its graphic nature make it a very useful tool for the evaluation of protein structures.Journal of Molecular Biology 06/1982; 157(1):105-32. · 3.91 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Protein segments that form amphipathic α-helices in their native state have periodic variation in the hydrophobicity values of the residues along the segment, with a 3.6 residue per cycle period characteristic of the α-helix. The assignment of hydrophobicity values to amino acids (hydrophobicity scale) affects the display of periodicity. Thirty-eight published hydrophobicity scales are compared for their ability to identify the characteristic period of α-helices, and an optimum scale for this purpose is computed using a new eigenvector method. Two of the published scales are also characterized by eigenvectors.We compare the usual method for detecting periodicity based on the discrete Fourier transform with a method based on a least-squares fit of a harmonic sequence to a sequence of hydrophobicity values. The two become equivalent for very long sequences, but, for shorter sequences with lengths commonly found in α-helices, the least-squares procedure gives a more reliable estimate of the period. The analog to the usual Fourier transform power spectrum is the “least-squares power spectrum”, the sum of squares accounted for in fitting a sinusoid of given frequency to a sequence of hydrophobicity values.The sum of the spectra of the α-helices in our data base peaks at 97.5 °, and approximately 50% of the helices can account for this peak. Thus, approximately 50% of the α-helices appear to be amphipathic, and, of those that are, the dominant frequency at 97.5 ° rather than 100 ° indicates that the helix is slightly more open than previously thought, with the number of residues per turn closer to 3.7 than 3.6. The extra openness is examined in crystallographic data, and is shown to be associated with the C terminus of the helix.The alpha amphipathic index, the key quantity in our analysis, measures the fraction of the total spectral area that is under the 97.5 ° peak, and is a characteristic of hydrophobicity scales that is consistent for different sets of helices. Our optimized scale maximizes the amphipathic index and has a correlation of 0.85 or higher with nine previously published scales. The most surprising feature of the optimized scale is that arginine tends to behave as if it were hydrophobic; i.e. in the crystallographic data base it has a tendency to be on the hydrophobic face of the amphipathic helix. Although the scale is optimal only for predicting α-amphipathicity, it also ranks high in identifying β-amphipathicity and in distinguishing interior from exterior residues in a protein.We factor the expressions for the power spectra into a matrix product so that the helical sequence information is isolated from the hydrophobicity scale. The largest eigenvalue of the matrix containing only helical sequence information also identifies the 97.5 ° frequency, thus confirming a 3.7 residue per turn spacing independently of hydrophobicity scales.Journal of Molecular Biology 07/1987; · 3.91 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: We show that the so-called 'positive inside' rule, i.e. the observation that positively charged amino acids tend to be more prevalent in cytoplasmic than in extra-cytoplasmic segments in transmembrane proteins [von Heijne, G. (1986) EMBO J. 5, 3021-3027], seems to hold for all polar segments in multi-spanning eukaryotic membrane proteins irrespective of their position in the sequence and hence can be used in conjunction with hydrophobicity analysis to predict their transmembrane topology. Further, as suggested by others, we confirm that the net charge difference across the first transmembrane segment correlates well with its orientation [Hartmann, E., Rapoport, T. A. and Lodish H. F. (1989) Proc. Natl Acad. Sci. USA 86, 5786-5790], and that the overall amino-acid composition of long polar segments can also be used to predict their cytoplasmic or extra-cytoplasmic location [Nakashima, H. and Nishikawa, K. (1992) FEBS Lett. 303, 141-146]. We present an approach to the topology prediction problem for eukaryotic membrane proteins based on a combination of these methods.European Journal of Biochemistry 06/1993; 213(3):1333-40. · 3.58 Impact Factor