[show abstract][hide abstract] ABSTRACT: Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures.
For three test sequences whose native structures belong to the all-alpha, all-beta and alphabeta classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd) with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of secondary structures and the surface area of residues. None of these criteria alone is 100% efficient in discriminating correct from erroneous models. The mean RMSd, RMSd fluctuations, secondary structure and clustering of conformations show some false positives whereas the residue surface area criterion shows false negatives. However if we consider these criteria in combination it is straightforward to discriminate the two types of models.
The ability of discriminating correct from erroneous models allows us to improve the specificity and sensitivity of our fold recognition method for a number of ambiguous cases.
[show abstract][hide abstract] ABSTRACT: The fold recognition methods are promissing tools for capturing the structure of a protein by its amino acid residues sequence but their use is still restricted by the needs of huge computational resources and suitable efficient algorithms as well. In the recent version of FROST (Fold Recognition Oriented Search Tool) package the most efficient algorithm for solving the Protein Threading Problem (PTP) is implemented due to the strong collaboration between the SYMBIOSE group in IRISA and MIG in Jouy-en-Josas. In this paper, we present the diverse components of FROST, emphasizing on the recent advances in formulating and solving new versions of the PTP and on the way of solving on a computer cluster a million of instances in a easonable time.
[show abstract][hide abstract] ABSTRACT: The fold recognition methods,are promissing tools for capturing the structure of a protein by its amino acid residues sequence but their use is still res tricted by the needs of huge computational resources and suitable efficient algorithms as well. In the r ecent version of FROST (Fold Recogni- tion Oriented Search Tool) package the most efficient algorithm for solving the Protei n Threading Problem (PTP) is implemented,due to the strong collaboration between,the SYMBIOSE group in IRISA and MIG in Jouy-en-Josas. In this paper, we present the diverse components of FROST, em- phasizing on the recent advances in formulating and solving new versions of the PTP and on the way of solving on a computer cluster a million of instances in a reasonable time. Key-words: Protein Threading Problem, Protein Structure, Parallel Processing IRISA, Campus de Beaulieu, 35042 Rennes, France
[show abstract][hide abstract] ABSTRACT: FROST (Fold Recognition-Oriented Search Tool) is a software whose purpose is to assign a 3D structure to a protein sequence. It is based on a series of filters and uses a database of about 1200 known 3D structures, each one associated with empirically determined score distributions. FROST uses these distributions to normalize the score obtained when a protein sequence is aligned with a particular 3D structure. Computing these distributions is extremely time consuming; it requires solving about 1,200,000 hard combinatorial optimization problems and takes about 40 days on a 2.4 GHz computer. This paper describes how FROST has been successfully redesigned and structured in modules and independent tasks. The new package organization allows these tasks to be distributed and executed in parallel using a centralized dynamic load balancing strategy. On a cluster of 12 PCs, computing the score distributions takes now about 3 days which represents a parallelization efficiency of about 1.
Parallel and Distributed Processing Symposium, International. 04/2005; 8:200a.
[show abstract][hide abstract] ABSTRACT: A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure.
To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures).
A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior.
Our method provides valuable assignments which favor the regularity of secondary structure segments.
[show abstract][hide abstract] ABSTRACT: High molecular weight glutenin subunits (HMW-GS) are of a particular interest because of their biomechanical properties, which are important in many food systems such as breadmaking. Using fold-recognition techniques, we identified a fold compatible with the N-terminal domain of HMW-GS Dy10. This fold corresponds to the one adopted by proteins belonging to the cereal inhibitor family. Starting from three known protein structures of this family as templates, we built three models for the N-terminal domain of HMW-GS Dy10. We analyzed these models, and we propose a number of hypotheses regarding the N-terminal domain properties that can be tested experimentally. In particular, we discuss two possible ways of interaction between the N-terminal domains of the y-type HMW glutenin subunits. The first way consists in the creation of interchain disulfide bridges. According to our models, we propose two plausible scenarios: (1) the existence of an intrachain disulfide bridge between cysteines 22 and 44, leaving the three other cysteines free of engaging in intermolecular bonds; and (2) the creation of two intrachain disulfide bridges (involving cysteines 22-44 and cysteines 10-55), leaving a single cysteine (45) for creating an intermolecular disulfide bridge. We discuss these scenarios in relation to contradictory experimental results. The second way, although less likely, is nevertheless worth considering. There might exist a possibility for the N-terminal domain of Dy10, Nt-Dy10, to create oligomers, because homologous cereal inhibitor proteins are known to exist as monomers, homodimers, and heterooligomers. We also discuss, in relation to the function of the cereal inhibitor proteins, the possibility that this N-terminal domain has retained similar inhibitory functions.
Protein Science 02/2003; 12(1):34-43. · 2.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: To assess the reliability of fold assignments to protein sequences, we developed a fold recognition method called FROST (Fold Recognition-Oriented Search Tool) based on a series of filters and a database specifically designed as a benchmark for this new method under realistic conditions. This benchmark database consists of proteins for which there exists, at least, another protein with an extensively similar 3D structure in a database of representative 3D structures (i.e., more than 65% of the residues in both proteins can be structurally aligned). Because the testing of our method must be carried out under conditions similar to those of real fold recognition experiments, no protein pair with sequence similarity detectable using standard sequence comparison methods such as FASTA is included in the benchmark database. While using FROST, we achieved a coverage of 60% for a rate of error of 1%. To obtain a baseline for our method, we used PSI-BLAST and 3D-PSSM. Under the same conditions, for a 1% error rate, coverages for PSI-BLAST and 3D-PSSM were 33 and 56%, respectively.
Proteins Structure Function and Bioinformatics 01/2003; 49(4):493-509. · 3.34 Impact Factor