-
[show abstract]
[hide abstract]
ABSTRACT: We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/~newton/sann/.
Proteins Structure Function and Bioinformatics 03/2012; 80(7):1791-7. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: High accuracy protein modeling from its sequence information is an important step toward revealing the sequence-structure-function relationship of proteins and nowadays it becomes increasingly more useful for practical purposes such as in drug discovery and in protein design. We have developed a protocol for protein structure prediction that can generate highly accurate protein models in terms of backbone structure, side-chain orientation, hydrogen bonding, and binding sites of ligands. To obtain accurate protein models, we have combined a powerful global optimization method with traditional homology modeling procedures such as multiple sequence alignment, chain building, and side-chain remodeling. We have built a series of specific score functions for these steps, and optimized them by utilizing conformational space annealing, which is one of the most successful combinatorial optimization algorithms currently available.
Methods in molecular biology (Clifton, N.J.) 01/2012; 857:175-88.
-
[show abstract]
[hide abstract]
ABSTRACT: The rapid increase in the number of experimentally determined protein structures in recent years enables us to obtain more reliable protein tertiary structure models than ever by template-based modeling. However, refinement of template-based models beyond the limit available from the best templates is still needed for understanding protein function in atomic detail. In this work, we develop a new method for protein terminus modeling that can be applied to refinement of models with unreliable terminus structures. The energy function for terminus modeling consists of both physics-based and knowledge-based potential terms with carefully optimized relative weights. Effective sampling of both the framework and terminus is performed using the conformational space annealing technique. This method has been tested on a set of termini derived from a nonredundant structure database and two sets of termini from the CASP8 targets. The performance of the terminus modeling method is significantly improved over our previous method that does not employ terminus refinement. It is also comparable or superior to the best server methods tested in CASP8. The success of the current approach suggests that similar strategy may be applied to other types of refinement problems such as loop modeling or secondary structure rearrangement.
Proteins Structure Function and Bioinformatics 09/2011; 79(9):2725-34. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Enterokinase light chain (EKL) is a serine protease that recognizes Asp-Asp-Asp-Asp-Lys (D(4)K) sequence and cleaves the C-terminal peptide bond of the lysine residue. The utility of EKL as a site-specific cleavage enzyme is hampered by sporadic cleavage at other sites than the canonical D(4)K recognition sequence. In order to produce more site-specific EKL, we have generated several EKL mutants in E. coli with substitutions at Tyr174 and Lys99 using PDI (protein disulfide isomerase) fusion system. Substitution of Tyr174 by basic residues confers higher specificity on EKL. The production of EKL with higher specificity could widen the utility of EKL as a site-specific cleavage enzyme to produce various recombinant proteins with therapeutic or industrial values.
Biotechnology Letters 02/2011; 33(6):1227-32. · 1.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The half reactions of ω-aminotransferase (ω-AT) from Vibrio fluvialis JS17 (ω-ATVf) were carried out using purified pyridoxal 5'-phosphate-enzyme (PLP-Enz) and pyridoxamine 5'-phosphate-enzyme (PMP-Enz) complexes to investigate the relative activities of substrates. In the reaction generating PMP-Enz from PLP-Enz using L-alanine as an amine donor, L-alanine showed about 70% of the initial reaction rate of (S)-α-methylbenzylamine ((S)-α-MBA). However, in the subsequent half reaction recycling PLP-Enz from PMP-Enz using acetophenone as an amine acceptor, acetophenone showed nearly negligible reactivity compared to pyruvate. These results indicate that the main bottleneck in the asymmetric synthesis of (S)-α-MBA lies not in the amination of PLP by alanine, but in the amination of acetophenone by PMP-Enz, where conformational restraints of the enzyme structure is likely to be the main reason for limiting the amine group transfer from PMP-Enz to acetophenone. Based upon those half reaction experiments using the two amino acceptors of different activity, it appears that the relative activities of the two amine donors and the two acceptors involved in the ω-AT reactions can roughly determine the asymmetric synthesis yield of the target chiral amine compound. Predicted conversion yields of several target chiral amines were calculated and compared with the experimental conversion yields. Approximately, a positive linear correlation (Pearson's correlation coefficient = 0.92) was observed between the calculated values and the experimental conversion yields. To overcome the low (S)-α-MBA productivity of ω-ATVf caused by the possible disadvantageous structural constraints for acetophenone, new ω-ATs showing higher affinity to benzene ring of acetophenone than ω-ATVf were computationally screened using comparative modeling and protein-ligand docking. ω-ATs from Streptomyces avermitilis MA-4680 (SAV2612) and Agrobacterium tumefaciens str. C58 (Atu4761) were selected, and the two screened ω-ATs showed higher asymmetric synthesis reaction rate of (S)-α-MBA and lower (S)-α-MBA degradation reaction rate than ω-ATVf. To verify the higher conversion yield of the variants of ω-ATs, the reaction with 50 mM acetophenone and 50 mM alanine was performed with coupling of lactate dehydrogenase and two-phase reaction system. SAV2612 and Atu4761 showed 70% and 59% enhanced yield in the synthesis of (S)-α-MBA compared to that of ω-ATVf, respectively.
Biotechnology and Bioengineering 02/2011; 108(2):253-63. · 3.95 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A correct alignment is an essential requirement in homology modeling. Yet in order to bridge the structural gap between template and target, which may not only involve loop rearrangements, but also shifts of secondary structure elements and repacking of core residues, high-resolution refinement methods with full atomic details are needed. Here, we describe four approaches that address this “last mile of the protein folding problem” and have performed well during CASP8, yielding physically realistic models: YASARA, which runs molecular dynamics simulations of models in explicit solvent, using a new partly knowledge-based all atom force field derived from Amber, whose parameters have been optimized to minimize the damage done to protein crystal structures. The LEE-SERVER, which makes extensive use of conformational space annealing to create alignments, to help Modeller build physically realistic models while satisfying input restraints from templates and CHARMM stereochemistry, and to remodel the side-chains. ROSETTA, whose high resolution refinement protocol combines a physically realistic all atom force field with Monte Carlo minimization to allow the large conformational space to be sampled quickly. And finally UNDERTAKER, which creates a pool of candidate models from various templates and then optimizes them with an adaptive genetic algorithm, using a primarily empirical cost function that does not include bond angle, bond length, or other physics-like terms. Proteins 2009. © 2009 Wiley-Liss, Inc.
Proteins Structure Function and Bioinformatics 08/2009; 77(S9):114 - 122. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Structural information of a protein can guide one to understand the function of the protein, and ligand binding is one of the major biochemical functions of proteins. We have applied a two-stage template-based ligand binding site prediction method to CASP8 targets and achieved high quality results with accuracy/coverage = 70/80 (LEE). First, templates are used for protein structure modeling and then for binding site prediction by structural clustering of ligand-containing templates to the predicted protein model. Remarkably, the results are only a few percent worse than those one can obtain from native structures, which were available only after the prediction. Prediction was performed without knowing identity of ligands, and consequently, in many cases the ligand molecules used for prediction were different from the actual ligands, and yet we find that the prediction was quite successful. The current approach can be easily combined with experiments to investigate protein activities in a systematic way.
Proteins Structure Function and Bioinformatics 08/2009; 77 Suppl 9:152-6. · 3.39 Impact Factor
-
Jae-Sung Woo,
Jae-Hong Lim,
Ho-Chul Shin,
Min-Kang Suh,
Bonsu Ku,
Kwang-Hoon Lee, Keehyoung Joo,
Howard Robinson,
Jooyoung Lee,
Sam-Yong Park,
Nam-Chul Ha,
Byung-Ha Oh
[show abstract]
[hide abstract]
ABSTRACT: Condensins are key mediators of chromosome condensation across organisms. Like other condensins, the bacterial MukBEF condensin complex consists of an SMC family protein dimer containing two ATPase head domains, MukB, and two interacting subunits, MukE and MukF. We report complete structural views of the intersubunit interactions of this condensin along with ensuing studies that reveal a role for the ATPase activity of MukB. MukE and MukF together form an elongated dimeric frame, and MukF's C-terminal winged-helix domains (C-WHDs) bind MukB heads to constitute closed ring-like structures. Surprisingly, one of the two bound C-WHDs is forced to detach upon ATP-mediated engagement of MukB heads. This detachment reaction depends on the linker segment preceding the C-WHD, and mutations on the linker restrict cell growth. Thus ATP-dependent transient disruption of the MukB-MukF interaction, which creates openings in condensin ring structures, is likely to be a critical feature of the functional mechanism of condensins.
Cell 02/2009; 136(1):85-96. · 32.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We have investigated the effect of rigorous optimization of the MODELLER energy function for possible improvement in protein all-atom chain-building. For this we applied the global optimization method called conformational space annealing (CSA) to the standard MODELLER procedure to achieve better energy optimization than what MODELLER provides. The method, which we call MODELLERCSA, is tested on two benchmark sets. The first is the 298 proteins taken from the HOMSTRAD multiple alignment set. By simply optimizing the MODELLER energy function, we observe significant improvement in side-chain modeling, where MODELLERCSA provides about 10.7% (14.5%) improvement for chi(1) (chi(1) + chi(2)) accuracy compared to the standard MODELLER modeling. The improvement of backbone accuracy by MODELLERCSA is shown to be less prominent, and a similar improvement can be achieved by simply generating many standard MODELLER models and selecting lowest energy models. However, the level of side-chain modeling accuracy by MODELLERCSA could not be matched either by extensive MODELLER strategies, side-chain remodeling by SCWRL3, or copying unmutated rotamers. The identical procedure was successfully applied to 100 CASP7 template base modeling domains during the prediction season in a blind fashion, and the results are included here for comparison. From this study, we observe a good correlation between the MODELLER energy and the side-chain accuracy. Our findings indicate that, when a good alignment between a target protein and its templates is provided, thorough optimization of the MODELLER energy function leads to accurate all-atom models.
Proteins Structure Function and Bioinformatics 12/2008; 75(4):1010-23. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present a new method for multiple sequence alignment (MSA), which we call MSACSA. The method is based on the direct application of a global optimization method called the conformational space annealing (CSA) to a consistency-based score function constructed from pairwise sequence alignments between constituting sequences. We applied MSACSA to two MSA databases, the 82 families from the BAliBASE reference set 1 and the 366 families from the HOMSTRAD set. In all 450 cases, we obtained well optimized alignments satisfying more pairwise constraints producing, in consequence, more accurate alignments on average compared with a recent alignment method SPEM. One of the advantages of MSACSA is that it provides not just the global minimum alignment but also many distinct low-lying suboptimal alignments for a given objective function. This is due to the fact that conformational space annealing can maintain conformational diversity while searching for the conformations with low energies. This characteristics can help us to alleviate the problem arising from using an inaccurate score function. The method was the key factor for our success in the recent blind protein structure prediction experiment.
Biophysical Journal 09/2008; 95(10):4813-9. · 3.65 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The global structural optimization is carried out for off-lattice protein AB models in two and three dimensions by conformational space annealing. The models consist of hydrophobic and hydrophilic monomers in Fibonacci sequences. To accelerate the convergence, we have introduced a shift operator in the internal coordinate system, and effectively reduced the search space by forming a quotient space. With this, we significantly improve our previous results on AB models, and provide new low energy conformations. This work provides insights on exploring complicated energy landscapes by exploiting the advantages and limitations of CSA.
Journal of Computational Chemistry 06/2008; 29(14):2479-84. · 4.58 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: For high-accuracy template-based-modeling of CASP7 targets, we have applied a procedure based on the rigorous optimization of score functions at three stages: multiple alignment, chain building, and side-chain modeling. We applied the conformational space annealing method to a newly developed consistency based score function for multiple alignment. For chain building, we optimized the MODELLER energy function. For side-chain modeling, we optimized a SCWRL-like energy function using a rotamer library constructed specifically for a given target sequence. By rigorous optimization, we have achieved significant improvement in backbone as well as side-chain modeling for TBM and TBM/HA targets. For most TBM/HA targets (17/26), the predicted model was more accurate than the model one can construct from the best template in a posteriori fashion. It appears that the current method can extract relevant information out of multiple templates.
Proteins Structure Function and Bioinformatics 02/2007; 69 Suppl 8:83-9. · 3.39 Impact Factor
-
Michael Tress,
Jianlin Cheng,
Pierre Baldi, Keehyoung Joo,
Jinwoo Lee,
Joo-Hyun Seo,
Jooyoung Lee,
David Baker,
Dylan Chivian,
David Kim,
Iakes Ezkurdia
[show abstract]
[hide abstract]
ABSTRACT: This paper details the assessment process and evaluation results for the Critical Assessment of Protein Structure Prediction (CASP7) domain prediction category. Domain predictions were assessed using the Normalized Domain Overlap score introduced in CASP6 and the accuracy of prediction of domain break points. The results of the analysis clearly demonstrate that the best methods are able to make consistently reliable predictions when the target has a structural template, although they are less good when the domain break occurs in a region not covered by a template. The conditions of the experiment meant that it was impossible to draw any conclusions about domain prediction for free modeling targets and it was also difficult to draw many distinctions between the best groups. Two thirds of the targets submitted were single domains and hence regarded as easy to predict. Even those targets defined as having multiple domains always had at least one domain with a similar template structure.
Proteins Structure Function and Bioinformatics 02/2007; 69 Suppl 8:137-51. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.
Proteins Structure Function and Bioinformatics 10/2004; 56(4):704-14. · 3.39 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Protein structure prediction has great potential of understanding the function of proteins at the molecular level and designing novel protein functions. Here, we report rapid and accurate structure prediction system running in an automated manner. Since fold recognition of the target protein to be modeled is the starting point of the template-guided model building process, various approaches – such as profile analysis, threading, and SCOP fold classification – have been applied to generate the template library and to select the best template structure. After the best template was determined, fold consistency within the template candidates was considered using TM-score and SCOP database to select additional template structures among the template library. To generate a total of 100 decoy sets, MODELLER was used with the selected template structure. The predicted decoys were clustered with the RMSD deviation criterion of 3 Å to obtain centroids from each cluster. Finally, the selected centroids were subject to side-chain rearrangement using SCWRL module. Our fully automated structure prediction system was examined with sample test sets consisting of recently released 80 PDB chains. Judged by the TM-score (≥0.4), we concluded that 60 cases (75%) showed similar structures of statistical significance. This prediction system provides the users with simple and reliable models within hours of query submission, so that it is quite simply used for high throughput enzyme screening.
Enzyme and Microbial Technology.