Hélène Touzet

French National Centre for Scientific Research, Lutetia Parisorum, Île-de-France, France

Are you Hélène Touzet?

Claim your profile

Publications (47)56.42 Total impact

  • Azadeh Saffarian · Mathieu Giraud · Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce the concept of RNA multistructures, which is a formal grammar-based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multistructures within a genomic sequence.
    No preview · Article · Mar 2015 · Journal of computational biology: a journal of computational molecular cell biology
  • [Show abstract] [Hide abstract]
    ABSTRACT: Metatranscriptomic data contributes another piece of the puzzle to understanding the phylogenetic structure and function of a community of organisms. High-quality total RNA is a bountiful mixture of ribosomal, transfer, messenger and other noncoding RNAs, where each family of RNA is vital to answering questions concerning the hidden microbial world. Software tools designed for deciphering metatranscriptomic data fall under two main categories: the first is to reassemble millions of short nucleotide fragments produced by high-throughput sequencing technologies into the original full-length transcriptomes for all organisms within a sample, and the second is to taxonomically classify the organisms and determine their individual functional roles within a community. Species identification is mainly established using the ribosomal RNA genes, whereas the behavior and functionality of a community is revealed by the messenger RNA of the expressed genes. Numerous chemical and computational methods exist to separate families of RNA prior to conducting further downstream analyses, primarily suitable for isolating mRNA or rRNA from a total RNA sample. In this chapter, we demonstrate a computational technique for filtering rRNA from total RNA using the software SortMeRNA. Additionally, we propose a post-processing pipeline using the latest software tools to conduct further studies on the filtered data, including the reconstruction of mRNA transcripts for functional analyses and phylogenetic classification of a community using the ribosomal RNA.
    No preview · Article · Jan 2015 · Methods in molecular biology (Clifton, N.J.)
  • Christophe Vroland · Mikaël Salson · Hélène Touzet

    No preview · Article · Oct 2014

  • No preview · Article · Sep 2014
  • Source
    Azadeh Saffarian · Mathieu Giraud · Hélène Touzet

    Preview · Article · Sep 2014
  • Source
    Robert Giegerich · Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic programming is a classical algorithmic paradigm, which often allows the evaluation of a search space of exponential size in polynomial time. Recursive problem decomposition, tabulation of intermediate results for re-use, and Bellman's Principle of Optimality are its well-understood ingredients. However, algorithms often lack abstraction and are difficult to implement, tedious to debug, and delicate to modify. The present article proposes a generic framework for specifying dynamic programming problems. This framework can handle all kinds of sequential inputs, as well as tree-structured data. Biosequence analysis, document processing, molecular structure analysis, comparison of objects assembled in a hierarchic fashion, and generally, all domains come under consideration where strings and ordered, rooted trees serve as natural data representations. The new approach introduces inverse coupled rewrite systems. They describe the solutions of combinatorial optimization problems as the inverse image of a term rewrite relation that reduces problem solutions to problem inputs. This specification leads to concise yet translucent specifications of dynamic programming algorithms. Their actual implementation may be challenging, but eventually, as we hope, it can be produced automatically. The present article demonstrates the scope of this new approach by describing a diverse set of dynamic programming problems which arise in the domain of computational biology, with examples in biosequence and molecular structure analysis.
    Preview · Article · Mar 2014 · Algorithms
  • Source
    Evguenia Kopylova · Laurent Noé · Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The application of next-generation sequencing (NGS) technologies to RNAs directly extracted from a community of organisms yields a mixture of fragments characterizing both coding and non-coding types of RNAs. The task to distinguish among these and to further categorize the families of messenger RNAs and ribosomal RNAs (rRNAs) is an important step for examining gene expression patterns of an interactive environment and the phylogenetic classification of the constituting species. Results: We present SortMeRNA, a new software designed to rapidly filter rRNA fragments from metatranscriptomic data. It is capable of handling large sets of reads and sorting out all fragments matching to the rRNA database with high sensitivity and low running time.
    Full-text · Article · Oct 2012 · Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract RNA locally optimal secondary structures provide a concise and exhaustive description of all possible secondary structures of a given RNA sequence, and hence a very good representation of the RNA folding space. In this paper, we present an efficient algorithm that computes all locally optimal secondary structures for any folding model that takes into account the stability of helical regions. This algorithm is implemented in a software called regliss that runs on a publicly accessible web server.
    Preview · Article · Oct 2012 · Journal of computational biology: a journal of computational molecular cell biology
  • Evguenia Kopylova · Laurent Noé · Helene Touzet

    No preview · Article · Jul 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
    Full-text · Article · May 2012 · Advances in Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene finders and a set of tools for the comparison, visualization, editing, and export of ncRNA candidates. Predictions can be filtered according to a large set of characteristics. Based on this environment, we created a public website located at http://RNAspace.org. It accepts genomic sequences up to 5 Mb, which permits for an online annotation of a complete bacterial genome or a small eukaryotic chromosome. The project is hosted as a Source Forge project (http://rnaspace.sourceforge.net/).
    Full-text · Article · Sep 2011 · RNA
  • [Show abstract] [Hide abstract]
    ABSTRACT: http://www.pasteur.fr/ip/easysite/pasteur/fr/recherche/communication-scientifique/conferences-et-congres-scientifiques/conferences-service-colloques-institut-pasteur/jobim-2011
    No preview · Article · Jul 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: CG-seq is a software pipeline to identify functional regions such as noncoding RNAs or protein coding genes in a genomic sequence by comparative analysis and multispecies comparison. It takes as input a genomic sequence to annotate and a set of other sequences coming from a variety of species to be compared against the user sequence. The pipeline includes several external software components to perform sequence analysis tasks as well as some new features that were especially developed for the purpose. CG-seq is distributed under the GPL licence. It is available both for command line interface usage or with a Graphical User Interface. It can be downloaded from http://bioinfo.lifl.fr/CGseq. A web version can also be runned from this same website for input data of limited length.
    Preview · Article · Oct 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a theoretical unifying framework to express the comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences and encompasses the main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial-time algorithm and an NP-completeness proof. The polynomial-time algorithm involves biologically relevant edit operations such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia, which is available at the Web server http://bioinfo.lifl.fr/RNA/gardenia.
    Full-text · Article · Jul 2010 · IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
  • Source
    Laurie Tonon · Hélène Touzet · Jean-Stéphane Varré
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA-binding transcription factors (TFs) play a central role in transcription regulation, and computational approaches that help in elucidating complex mechanisms governing this basic biological process are of great use. In this perspective, we present the TFM-Explorer web server that is a toolbox to identify putative TF binding sites within a set of upstream regulatory sequences of genes sharing some regulatory mechanisms. TFM-Explorer finds local regions showing overrepresentation of binding sites. Accepted organisms are human, mouse, rat, chicken and drosophila. The server employs a number of features to help users to analyze their data: visualization of selected binding sites on genomic sequences, and selection of cis-regulatory modules. TFM-Explorer is available at http://bioinfo.lifl.fr/TFM.
    Full-text · Article · Jul 2010 · Nucleic Acids Research
  • Source
    Aude Liefooghe · Hélène Touzet · Jean-Stéphane Varré
    [Show abstract] [Hide abstract]
    ABSTRACT: Position Weight Matrices are broadly used probabilistic motif models. In this paper, we address the problem of identifying and characterizing potential overlaps between occurrences of such a motif. It has useful applications to the statistics of the number of occurrences, and to weighted pattern matching with an extension of the well-known Knuth-Morris-Pratt algorithm.
    Full-text · Conference Paper · Apr 2009
  • Arnaud Fontaine · Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called PROTEA. Large-scale experimentation shows good results. PROTEA is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.
    No preview · Article · Feb 2009 · International Journal of Data Mining and Bioinformatics
  • Arnaud Fontaine · Hélène Touzet

    No preview · Article · Jan 2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The divergent domain D8 of the large ribosomal RNA is very variable and extended in vertebrates compared to other eukaryotes. We provide data from 31 species of echinoderms and present the first comparative analysis of the D8 in nonvertebrate deuterostomes. In addition, we obtained 16S mitochondrial DNA sequences for the sea urchin taxa and analyzed single-strand conformation polymorphism (SSCP) of D8 in several populations within the species complex Echinocardium cordatum. A common secondary structure supported by compensatory substitutions and indels is inferred for echinoderms. Variation mostly arises at the tip of the longest stem (D8a), and the most variable taxa also display the longest and most stable D8. The most stable variants are the only ones displaying bulges in the terminal part of the stem, suggesting that selection, rather than maximizing stability of the D8 secondary structure, maintains it in a given range. Striking variation in D8 evolutionary rates was evidenced among sea urchins, by comparison with both 16S mitochondrial DNA and paleontological data. In Echinocardium cordatum and Strongylocentrotus pallidus and S. droebachiensis, belonging to very distant genera, the increase in D8 evolutionary rate is extreme. Their highly stable D8 secondary structures rule out the possibility of pseudogenes. These taxa are the only ones in which interspecific hybridization was reported. We discuss how evolutionary rates may be affected in nuclear relative to mitochondrial genes after hybridization, by selective or mutational processes such as gene silencing and concerted evolution.
    Full-text · Article · Nov 2008 · Journal of Molecular Evolution
  • Source
    Arnaud Fontaine · Antoine de Monte · Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designed for sequences that are intended to be either protein-coding or structural RNAs. It extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern between sequences before aligning them. The alignment step then incorporates this information to achieve higher accuracy. The website is available at http://bioinfo.lifl.fr/magnolia.
    Preview · Article · Aug 2008 · Nucleic Acids Research

Publication Stats

598 Citations
56.42 Total Impact Points

Institutions

  • 2008-2015
    • French National Centre for Scientific Research
      • Centre of Molecular Genetics
      Lutetia Parisorum, Île-de-France, France
  • 2009-2012
    • University of Lille Nord de France
      Lille, Nord-Pas-de-Calais, France
  • 2004-2009
    • Université des Sciences et Technologies de Lille 1
      • Laboratoire d'Informatique Fondamentale de Lille (LIFL)
      Lille, Nord-Pas-de-Calais, France
  • 2003
    • Université Bordeaux 1
      Talence, Aquitaine, France