-
[show abstract]
[hide abstract]
ABSTRACT: The structure of a protein determines its function and its interactions with other factors. Regions of proteins that interact with ligands, substrates, and/or other proteins, tend to be conserved both in sequence and structure, and the residues involved are usually in close spatial proximity. More than 70,000 protein structures are currently found in the Protein Data Bank, and approximately one-third contain metal ions essential for function. Identifying and characterizing metal ion-binding sites experimentally is time-consuming and costly. Many computational methods have been developed to identify metal ion-binding sites, and most use only sequence information. For the work reported herein, we developed a method that uses sequence and structural information to predict the residues in metal ion-binding sites. Six types of metal ion-binding templates- those involving Ca(2+), Cu(2+), Fe(3+), Mg(2+), Mn(2+), and Zn(2+)-were constructed using the residues within 3.5 Å of the center of the metal ion. Using the fragment transformation method, we then compared known metal ion-binding sites with the templates to assess the accuracy of our method. Our method achieved an overall 94.6 % accuracy with a true positive rate of 60.5 % at a 5 % false positive rate and therefore constitutes a significant improvement in metal-binding site prediction.
PLoS ONE 01/2012; 7(6):e39252. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.
PLoS ONE 01/2011; 6(5):e20445. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Hoarseness attributed to vocal cord palsy is associated with injury to the recurrent laryngeal nerve. Hoarseness resulting from left recurrent laryngeal nerve palsy, cardiovocal syndrome (Ortner's syndrome), has rarely been reported. We present the case of a 79-year-old male suffering from hoarseness in the absence of significant clinical manifestations. A flexible laryngoscope was used to identify a paralyzed left vocal cord, and contrast-enhanced computed tomography showed a large thrombus-filled aneurysmal dilation of the aortic arch. The severity of the vocal cord paralysis was improved by surgical intervention. This case illustrates that life-threatening cardiovascular comorbidities can cause hoarseness and that an impaired recurrent laryngeal nerve might be correctable.
The Kaohsiung journal of medical sciences 05/2009; 25(4):203-6. · 0.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: It has recently been shown that in proteins the atomic mean-square displacement (or B-factor) can be related to the number of the neighboring atoms (or protein contact number), and that this relationship allows one to compute the B-factor profiles directly from protein contact number. This method, referred to as the protein contact model, is appealing, since it requires neither trajectory integration nor matrix diagonalization. As a result, the protein contact model can be applied to very large proteins and can be implemented as a high-throughput computational tool to compute atomic fluctuations in proteins. Here, we show that this relationship can be further refined to that between the atomic mean-square displacement and the weighted protein contact-number, the weight being the square of the reciprocal distance between the contacting pair. In addition, we show that this relationship can be utilized to compute the cross-correlation of atomic motion (the B-factor is essentially the auto-correlation of atomic motion). For a nonhomologous dataset comprising 972 high-resolution X-ray protein structures (resolution <2.0 A and sequence identity <25%), the mean correlation coefficient between the X-ray and computed B-factors based on the weighted protein contact-number model is 0.61, which is better than those of the original contact-number model (0.51) and other methods. We also show that the computed correlation maps based on the weighted contact-number model are globally similar to those computed through normal model analysis for some selected cases. Our results underscore the relationship between protein dynamics and protein packing. We believe that our method will be useful in the study of the protein structure-dynamics relationship.
Proteins Structure Function and Bioinformatics 09/2008; 72(3):929-35. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Recently, we have developed a method (Shih et al., Proteins: Structure, Function, and Bioinformatics 2007;68: 34-38) to compute correlation of fluctuations of proteins. This method, referred to as the protein fixed-point (PFP) model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the PFP model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model will be applied. In this article, we have carried out the comprehensive analysis of the PFP model for a dataset comprising 972 high-resolution X-ray structures with pairwise sequence identity <or=25%. We found that in most cases the PFP model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the resultant correlation coefficient between the computed and the X-ray structural B-factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient >or=0.5. Our result shows that the fixed-point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins.
Proteins Structure Function and Bioinformatics 08/2008; 72(2):625-34. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507-512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature.
Proteins Structure Function and Bioinformatics 06/2007; 67(2):262-70. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.
Proteins Structure Function and Bioinformatics 09/2006; 64(3):643-51. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.
Proteins Structure Function and Bioinformatics 05/2006; 63(3):636-43. · 3.39 Impact Factor