Hélène Touzet

University of Lille Nord de France, Lille, Nord-Pas-de-Calais, France

Are you Hélène Touzet?

Claim your profile

Publications (47)54.11 Total impact

  • Azadeh Saffarian, Mathieu Giraud, Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce the concept of RNA multistructures, which is a formal grammar-based framework specifically designed to model a set of alternate RNA secondary structures. Such alternate structures can either be a set of suboptimal foldings, or distinct stable folding states, or variants within an RNA family. We provide several such examples and propose an efficient algorithm to search for RNA multistructures within a genomic sequence.
    Journal of computational biology: a journal of computational molecular cell biology 03/2015; 22(3):190-204. DOI:10.1089/cmb.2014.0272 · 1.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Metatranscriptomic data contributes another piece of the puzzle to understanding the phylogenetic structure and function of a community of organisms. High-quality total RNA is a bountiful mixture of ribosomal, transfer, messenger and other noncoding RNAs, where each family of RNA is vital to answering questions concerning the hidden microbial world. Software tools designed for deciphering metatranscriptomic data fall under two main categories: the first is to reassemble millions of short nucleotide fragments produced by high-throughput sequencing technologies into the original full-length transcriptomes for all organisms within a sample, and the second is to taxonomically classify the organisms and determine their individual functional roles within a community. Species identification is mainly established using the ribosomal RNA genes, whereas the behavior and functionality of a community is revealed by the messenger RNA of the expressed genes. Numerous chemical and computational methods exist to separate families of RNA prior to conducting further downstream analyses, primarily suitable for isolating mRNA or rRNA from a total RNA sample. In this chapter, we demonstrate a computational technique for filtering rRNA from total RNA using the software SortMeRNA. Additionally, we propose a post-processing pipeline using the latest software tools to conduct further studies on the filtered data, including the reconstruction of mRNA transcripts for functional analyses and phylogenetic classification of a community using the ribosomal RNA.
    Methods in molecular biology (Clifton, N.J.) 01/2015; 1269:279-91. DOI:10.1007/978-1-4939-2291-8_17 · 1.29 Impact Factor
  • Source
    Evguenia Kopylova, Laurent Noé, Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: The application of Next-Generation Sequencing (NGS) technologies to RNAs directly extracted from a community of organisms yields a mixture of fragments characterizing both coding and non-coding types of RNAs. The tasks to distinguish among these and to further categorize the families of messenger RNAs and ribosomal RNAs is an important step for examining gene expression patterns of an interactive environment and the phylogenetic classification of the constituting species. RESULTS: We present SortMeRNA, a new software designed to rapidly filter ribosomal RNA fragments from metatranscriptomic data. It is capable of handling large sets of reads and sorting out all fragments matching to the rRNA database with high sensitivity and low running time. AVAILABILITY: http://bioinfo.lifl.fr/RNA/sortmerna CONTACT: evguenia.kopylova@lifl.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2012; 28(24):3211-7. DOI:10.1093/bioinformatics/bts611 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract RNA locally optimal secondary structures provide a concise and exhaustive description of all possible secondary structures of a given RNA sequence, and hence a very good representation of the RNA folding space. In this paper, we present an efficient algorithm that computes all locally optimal secondary structures for any folding model that takes into account the stability of helical regions. This algorithm is implemented in a software called regliss that runs on a publicly accessible web server.
    Journal of computational biology: a journal of computational molecular cell biology 10/2012; 19(10):1120-33. DOI:10.1089/cmb.2010.0178 · 1.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
    Advances in Bioinformatics 05/2012; 2012:893048. DOI:10.1155/2012/893048
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene finders and a set of tools for the comparison, visualization, editing, and export of ncRNA candidates. Predictions can be filtered according to a large set of characteristics. Based on this environment, we created a public website located at http://RNAspace.org. It accepts genomic sequences up to 5 Mb, which permits for an online annotation of a complete bacterial genome or a small eukaryotic chromosome. The project is hosted as a Source Forge project (http://rnaspace.sourceforge.net/).
    RNA 09/2011; 17(11):1947-56. DOI:10.1261/rna.2844911 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA-binding transcription factors (TFs) play a central role in transcription regulation, and computational approaches that help in elucidating complex mechanisms governing this basic biological process are of great use. In this perspective, we present the TFM-Explorer web server that is a toolbox to identify putative TF binding sites within a set of upstream regulatory sequences of genes sharing some regulatory mechanisms. TFM-Explorer finds local regions showing overrepresentation of binding sites. Accepted organisms are human, mouse, rat, chicken and drosophila. The server employs a number of features to help users to analyze their data: visualization of selected binding sites on genomic sequences, and selection of cis-regulatory modules. TFM-Explorer is available at http://bioinfo.lifl.fr/TFM.
    Nucleic Acids Research 07/2010; 38(Web Server issue):W286-92. DOI:10.1093/nar/gkq473 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a theoretical unifying framework to express the comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences and encompasses the main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial-time algorithm and an NP-completeness proof. The polynomial-time algorithm involves biologically relevant edit operations such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia, which is available at the Web server http://bioinfo.lifl.fr/RNA/gardenia.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 07/2010; 7(2):309-22. DOI:10.1109/TCBB.2008.28 · 1.54 Impact Factor
  • Arnaud Fontaine, Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called PROTEA. Large-scale experimentation shows good results. PROTEA is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.
    International Journal of Data Mining and Bioinformatics 02/2009; 3(2):160-76. · 0.66 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Position Weight Matrices are broadly used probabilistic motif models. In this paper, we address the problem of identifying and characterizing potential overlaps between occurrences of such a motif. It has useful applications to the statistics of the number of occurrences, and to weighted pattern matching with an extension of the well-known Knuth-Morris-Pratt algorithm.
    Language and Automata Theory and Applications, Third International Conference, LATA 2009, Tarragona, Spain, April 2-8, 2009. Proceedings; 01/2009
  • Arnaud Fontaine, Hélène Touzet
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The divergent domain D8 of the large ribosomal RNA is very variable and extended in vertebrates compared to other eukaryotes. We provide data from 31 species of echinoderms and present the first comparative analysis of the D8 in nonvertebrate deuterostomes. In addition, we obtained 16S mitochondrial DNA sequences for the sea urchin taxa and analyzed single-strand conformation polymorphism (SSCP) of D8 in several populations within the species complex Echinocardium cordatum. A common secondary structure supported by compensatory substitutions and indels is inferred for echinoderms. Variation mostly arises at the tip of the longest stem (D8a), and the most variable taxa also display the longest and most stable D8. The most stable variants are the only ones displaying bulges in the terminal part of the stem, suggesting that selection, rather than maximizing stability of the D8 secondary structure, maintains it in a given range. Striking variation in D8 evolutionary rates was evidenced among sea urchins, by comparison with both 16S mitochondrial DNA and paleontological data. In Echinocardium cordatum and Strongylocentrotus pallidus and S. droebachiensis, belonging to very distant genera, the increase in D8 evolutionary rate is extreme. Their highly stable D8 secondary structures rule out the possibility of pseudogenes. These taxa are the only ones in which interspecific hybridization was reported. We discuss how evolutionary rates may be affected in nuclear relative to mitochondrial genes after hybridization, by selective or mutational processes such as gene silencing and concerted evolution.
    Journal of Molecular Evolution 11/2008; 67(5):539-50. DOI:10.1007/s00239-008-9171-8 · 1.86 Impact Factor
  • Source
    Arnaud Fontaine, Antoine de Monte, Hélène Touzet
    [Show abstract] [Hide abstract]
    ABSTRACT: MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designed for sequences that are intended to be either protein-coding or structural RNAs. It extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern between sequences before aligning them. The alignment step then incorporates this information to achieve higher accuracy. The website is available at http://bioinfo.lifl.fr/magnolia.
    Nucleic Acids Research 08/2008; 36(Web Server issue):W14-8. DOI:10.1093/nar/gkn321 · 9.11 Impact Factor
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: The main goal of SEQUOIA project-team is to define appropriate combinatorial models and efficient algorithms for large-scale sequence analysis in molecular biology. An emphasis is made on the annotation of non-coding regions in genomes – RNA genes and regulatory sequences – via comparative genomics methods. This task involves several complementary issues such as sequence comparison, prediction, analysis and manipulation of RNA secondary structures, identification and processing of regulatory sequences. Efficient algorithms and parallelism on high-performance computing architectures allow large-scale instances of such issues. Our aim is to tackle all those issues in an integrated fashion and to put together the developed software tools into a common platform for annotation of non-coding regions. We also explore complementary problems of protein sequence analysis. Those include new approaches to protein sequence comparison on the one hand, and a system for storing and manipulating nonribosomal peptides on the other hand. A special attention is given to the development of robust software, its validation on biological data and to its availability from the software platform of the team and by other means. Most of research projects are carried out in collaboration with biologists.

Publication Stats

512 Citations
54.11 Total Impact Points

Institutions

  • 2009–2012
    • University of Lille Nord de France
      Lille, Nord-Pas-de-Calais, France
  • 2008–2012
    • French National Centre for Scientific Research
      • Centre of Molecular Genetics
      Lutetia Parisorum, Île-de-France, France
  • 2010
    • University of Paris-Est
      Centre, France
  • 2006–2009
    • Université des Sciences et Technologies de Lille 1
      • Laboratoire d'Informatique Fondamentale de Lille (LIFL)
      Lille, Nord-Pas-de-Calais, France