Antoine Marin

Institute of Physical and Chemical Biology, Lutetia Parisorum, Île-de-France, France

Are you Antoine Marin?

Claim your profile

Publications (10)15.89 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures. For three test sequences whose native structures belong to the all-alpha, all-beta and alphabeta classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd) with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of secondary structures and the surface area of residues. None of these criteria alone is 100% efficient in discriminating correct from erroneous models. The mean RMSd, RMSd fluctuations, secondary structure and clustering of conformations show some false positives whereas the residue surface area criterion shows false negatives. However if we consider these criteria in combination it is straightforward to discriminate the two types of models. The ability of discriminating correct from erroneous models allows us to improve the specificity and sensitivity of our fold recognition method for a number of ambiguous cases.
    BMC Bioinformatics 02/2008; 9:6. DOI:10.1186/1471-2105-9-6 · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The fold recognition methods are promissing tools for capturing the structure of a protein by its amino acid residues sequence but their use is still restricted by the needs of huge computational resources and suitable efficient algorithms as well. In the recent version of FROST (Fold Recognition Oriented Search Tool) package the most efficient algorithm for solving the Protein Threading Problem (PTP) is implemented due to the strong collaboration between the SYMBIOSE group in IRISA and MIG in Jouy-en-Josas. In this paper, we present the diverse components of FROST, emphasizing on the recent advances in formulating and solving new versions of the PTP and on the way of solving on a computer cluster a million of instances in a easonable time.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The fold recognition methods,are promissing tools for capturing the structure of a protein by its amino acid residues sequence but their use is still res tricted by the needs of huge computational resources and suitable efficient algorithms as well. In the r ecent version of FROST (Fold Recogni- tion Oriented Search Tool) package the most efficient algorithm for solving the Protei n Threading Problem (PTP) is implemented,due to the strong collaboration between,the SYMBIOSE group in IRISA and MIG in Jouy-en-Josas. In this paper, we present the diverse components of FROST, em- phasizing on the recent advances in formulating and solving new versions of the PTP and on the way of solving on a computer cluster a million of instances in a reasonable time. Key-words: Protein Threading Problem, Protein Structure, Parallel Processing IRISA, Campus de Beaulieu, 35042 Rennes, France
    Grid Computing for Bioinformatics and Computational Biology, 04/2007: pages 325 - 356; , ISBN: 9780470191637
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: FROST (Fold Recognition-Oriented Search Tool) is a software whose purpose is to assign a 3D structure to a protein sequence. It is based on a series of filters and uses a database of about 1200 known 3D structures, each one associated with empirically determined score distributions. FROST uses these distributions to normalize the score obtained when a protein sequence is aligned with a particular 3D structure. Computing these distributions is extremely time consuming; it requires solving about 1,200,000 hard combinatorial optimization problems and takes about 40 days on a 2.4 GHz computer. This paper describes how FROST has been successfully redesigned and structured in modules and independent tasks. The new package organization allows these tasks to be distributed and executed in parallel using a centralized dynamic load balancing strategy. On a cluster of 12 PCs, computing the score distributions takes now about 3 days which represents a parallelization efficiency of about 1.
    04/2005; 8:200a. DOI:10.1109/IPDPS.2005.231
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure. To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures). A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior. Our method provides valuable assignments which favor the regularity of secondary structure segments.
    BMC Structural Biology 02/2005; 5:17. DOI:10.1186/1472-6807-5-17 · 2.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An approach to automatic prediction of the amino acid type from NMR chemical shift values of its nuclei is presented here, in the frame of a model to calculate the probability of an amino acid type given the set of chemical shifts. The method relies on systematic use of all chemical shift values contained in the BioMagResBank (BMRB). Two programs were designed, one (BMRB stats) for extracting statistical chemical shift parameters from the BMRB and another one (RESCUE2) for computing the probabilities of each amino acid type, given a set of chemical shifts. The Bayesian prediction scheme presented here is compared to other methods already proposed: PROTYP RESCUE and PLATON and is found to be more sensitive and more specific. Using this scheme, we tested various sets of nuclei. The two nuclei carrying the most information are C(beta) and H(beta), in agreement with observations made in Grzesiek and Bax, 1993. Based on four nuclei: H(beta), C(beta), C(alpha) and C', it is possible to increase correct predictions to a rate of more than 75%. Taking into account the correlations between the nuclei chemical shifts has only a slight impact on the percentage of correct predictions: indeed, the largest correlation coefficients display similar features on all amino acids.
    Journal of Biomolecular NMR 10/2004; 30(1):47-60. DOI:10.1023/B:JNMR.0000042948.12381.88 · 3.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: cDNA encoding the endo-1,3-β-d-glucanase from Spisula sachalinensis (LIV) was amplified by PCR using oligonucleotides deduced from the N-terminal end peptide sequence. Predicted enzyme structure consists of 444 amino acids with a signal sequence. The mature enzyme has 316 amino acids and its deduced amino acid sequence coincides completely with the N-terminal end (38 amino acids) of the β-1,3-glucanase (LIV) isolated from the mollusk. The enzyme sequence from Val 121 to Met 441 reveals closest homology with Pacifastacus leniusculus lipopolysaccharide- and β-1,3-glucan-binding protein and with coelomic cytolytic factors from Lumbricus terrestris. The mollusk glucanase also shows 36% identity and 56% similarity with β-1,3-glucanase of the sea urchin Strongylocentrotus purpuratus. It is generally considered that invertebrate glucanase-like proteins containing the bacterial glucanase motif have evolved from an ancient β-1,3-glucanase gene, but most of them lost their glucanase activity in the course of evolution and retained only the glucan-binding activity. A more detailed evaluation of the protein folding elicited very interesting relationships between the active site of LIV and other enzymes, which hydrolyze native glucans.
    Comparative Biochemistry and Physiology Part B Biochemistry and Molecular Biology 02/2004; 137(2):169-178. DOI:10.1016/j.cbpc.2003.10.018 · 1.90 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High molecular weight glutenin subunits (HMW-GS) are of a particular interest because of their biomechanical properties, which are important in many food systems such as breadmaking. Using fold-recognition techniques, we identified a fold compatible with the N-terminal domain of HMW-GS Dy10. This fold corresponds to the one adopted by proteins belonging to the cereal inhibitor family. Starting from three known protein structures of this family as templates, we built three models for the N-terminal domain of HMW-GS Dy10. We analyzed these models, and we propose a number of hypotheses regarding the N-terminal domain properties that can be tested experimentally. In particular, we discuss two possible ways of interaction between the N-terminal domains of the y-type HMW glutenin subunits. The first way consists in the creation of interchain disulfide bridges. According to our models, we propose two plausible scenarios: (1) the existence of an intrachain disulfide bridge between cysteines 22 and 44, leaving the three other cysteines free of engaging in intermolecular bonds; and (2) the creation of two intrachain disulfide bridges (involving cysteines 22-44 and cysteines 10-55), leaving a single cysteine (45) for creating an intermolecular disulfide bridge. We discuss these scenarios in relation to contradictory experimental results. The second way, although less likely, is nevertheless worth considering. There might exist a possibility for the N-terminal domain of Dy10, Nt-Dy10, to create oligomers, because homologous cereal inhibitor proteins are known to exist as monomers, homodimers, and heterooligomers. We also discuss, in relation to the function of the cereal inhibitor proteins, the possibility that this N-terminal domain has retained similar inhibitory functions.
    Protein Science 02/2003; 12(1):34-43. DOI:10.1110/ps.0229803 · 2.86 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To assess the reliability of fold assignments to protein sequences, we developed a fold recognition method called FROST (Fold Recognition-Oriented Search Tool) based on a series of filters and a database specifically designed as a benchmark for this new method under realistic conditions. This benchmark database consists of proteins for which there exists, at least, another protein with an extensively similar 3D structure in a database of representative 3D structures (i.e., more than 65% of the residues in both proteins can be structurally aligned). Because the testing of our method must be carried out under conditions similar to those of real fold recognition experiments, no protein pair with sequence similarity detectable using standard sequence comparison methods such as FASTA is included in the benchmark database. While using FROST, we achieved a coverage of 60% for a rate of error of 1%. To obtain a baseline for our method, we used PSI-BLAST and 3D-PSSM. Under the same conditions, for a 1% error rate, coverages for PSI-BLAST and 3D-PSSM were 33 and 56%, respectively.
    Proteins Structure Function and Bioinformatics 01/2003; 49(4):493-509. DOI:10.1002/prot.10231 · 2.92 Impact Factor