Alain DeniseUniversité Paris-Saclay · Information Science And Technology
Alain Denise
Professor
About
108
Publications
14,739
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,005
Citations
Introduction
Additional affiliations
October 2002 - present
September 2002 - present
Publications
Publications (108)
We present a new approach for the prediction of the coarse-grain 3D structure of RNA molecules. We model a molecule as being made of helices and junctions. Those junctions are classified into topological families that determine their preferred 3D shapes. All the parts of the molecule are then allowed to establish long-distance contacts that induce...
The problem of RNA secondary structure design is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a s...
This study investigates the importance of the structural context in the formation of a type I/II A-minor motif. This very frequent structural motif has been shown to be important in the spatial folding of RNA molecules. We developed an automated method to classify A-minor motif occurrences according to their 3D context similarities, and we used a g...
Massive biological datasets are available in various sources. To answer a biological question (e.g., “which are the genes involved in a given disease?”), life scientists query and mine such datasets using various techniques. Each technique provides a list of results usually ranked by importance (e.g., a list of ranked genes). Combining the results...
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking toget...
Motivations
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking tog...
We study the autocorrelation coefficients of the Rudin–Shapiro polynomials, proving in particular that their maximum on the interval [1, 2ⁿ) is bounded from below by C12αn and is bounded from above by C22α’n where α = 0.7302852 ... and α′ = 0.7302867....
Massive biological datasets are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms). Consequently, users have to manually comb...
We study the autocorrelation coefficients of the Rudin-Shapiro polynomials, proving in particular that their maximum on the interval $[1, 2^n)$ is bounded from below by $C_1 2^{\alpha n}$ and is bounded from above by $C_2 2^{\alpha' n}$ where $\alpha = 0.7302852...$ and $\alpha' = 0.7302859...$.
Background
In systems biology, there is an acute need for integrative approaches in heterogeneous network mining in order to exploit the continuous flux of genomic data. Simultaneous analysis of the metabolic pathways and genomic context of a given species leads to the identification of patterns consisting in reaction chains catalyzed by products o...
The wealth of the combinatorics of nucleotide base pairs enables RNA molecules to assemble into sophisticated interaction networks, which are used to create complex 3D substructures. These interaction networks are essential to shape the 3D architecture of the molecule, and also to provide the key elements to carry molecular functions such as protei...
Motivation: Predicting the 3D structure of RNA molecules is a key feature towards predicting their functions. Methods which work at atomic or nucleotide level are not suitable for large molecules. In these cases, coarse-grained prediction methods aim to predict a shape which could be refined later by using more precise methods on smaller parts of t...
Le workshop pluridisciplinaire SeqBio 2015 s’est déroulé à l’Université Paris-Sud à Orsay le 26 et 27 novembre 2015.Il réunit les communautés d’informatique et de bioinformatique travaillant sur les méthodes d’analyses des textes et les biologistes, génomiciens intéressés par la bioinformatique de séquences.Grâce au financement des GdR CNRS BIM et...
The problem of aggregating multiple rankings into one consensus ranking is an active research topic especially in the database community. Various studies have implemented methods for rank aggregation and may have come up with contradicting conclusions upon which algorithms work best. Comparing such results is cumbersome, as the original studies mix...
Cellular processes involve large numbers of RNA molecules. The functions of these RNA molecules and their binding to molecular machines are highly dependent on their 3D structures. One of the key challenges in RNA structure prediction and modeling is predicting the spatial arrangement of the various structural elements of RNA. As RNA folding is gen...
The problem of aggregating multiple rankings into one consensus ranking is an active research topic especially in the database community. Various studies have implemented methods for rank aggregation and may have come up with contradicting conclusions upon which algorithms work best. Comparing such results is cumbersome, as the original studies mix...
The U2AF heterodimer has been well studied for its role in defining functional 3' splice sites in pre-mRNA splicing, but many fundamental questions still remain unaddressed regarding the function of U2AF in mammalian genomes. Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to directly define ~88% of funct...
Background
In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions.
Methods
We developed...
This paper introduces ConQuR-Bio which aims at assisting scientists when they query public biological databases. Various reformulations of the user query are generated using medical terminologies. Such alternative reformulations are then used to rank the query results using a new consensus ranking strategy. The originality of our approach thus lies...
RNA molecules play major roles in all cell processes, and therefore have been subject to a great attention by biologists, biochemists and bioinformaticians in the recent years. From a computational optimization point of view, two interrelated major issues are on one hand the problem of structure prediction, and the problem of comparing two or sever...
In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. Methods: We developed a method b...
Random sequences can be used to extract relevant information from biological sequences. The random sequences represent the "background noise" from which it is possible to differentiate the real biological information. For example, random sequences are widely used to detect over-represented and under-represented motifs, or to determine whether the s...
We try to characterize the evolutionary origin of the enzymatic repertoire of different fungal groups. The characteristics for each of the groups studied are determined through the application of data mining method on enzyme profiles previously determined by comparative genomics. Through the presentation of results for taxonomic groups Agaricomycet...
We present a new algorithm for generating uniformly at random words of any regular language L. When using floating point arithmetics, its bit-complexity is O(q lg² n) in space and O(qn lg² n) in time, where n stands for the length of the word, and q stands for the number of states of a finite deterministic automaton of L. We implemented the algorit...
We present a general setting for structure-sequence comparison in a large
class of RNA structures that unifies and generalizes a number of recent works
on specific families on structures. Our approach is based on tree decomposition
of structures and gives rises to a general parameterized algorithm, where the
exponential part of the complexity depen...
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forest...
This paper presents several randomised algorithms for generating paths in large models according to a given coverage criterion.
Using methods for counting combinatorial structures, these algorithms can efficiently explore very large models, based on
a graphical representation by an automaton or by a product of several automata. This new approach ca...
We present an algorithm for automatically predicting the topological family of any RNA three-way junction, given only the information from the secondary structure: the sequence and the Watson-Crick pairings. The parameters of the algorithm have been determined on a data set of 33 three-way junctions whose 3D conformation is known. We applied the al...
Faced with the deluge of data available in biological databases, it becomes increasingly difficult for scientists to obtain reasonable sets of answers to their biological queries. A critical example appears in medicine, where physicians frequently need to get information about genes associated with a given disease. When they pose such queries to We...
In 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algorithms. More importantly, we precisely quantify the hierarchy by giving closed or asymptotic formulas for...
Grosu and Smolka have proposed a randomised Monte-Carlo algorithm for LTL model-checking. Their method is based on random exploration of the intersection of the model and of the Büchi automaton that represents the property to be checked. The targets of this exploration are so-called lassos, i.e. elementary paths followed by elementary circuits. Dur...
In comparative protein modeling, the quality of a template model depends heavily on the quality of the initial alignment between a given protein with unknown structure to various template proteins, whose tertiary structure is available in the Protein Data Bank (PDB). Although pairwise sequence alignment has been solved for more than three decades,...
In this report, we studied the effect of RNA structures on the activity of exonic splicing enhancers on the SMN1 minigene model by engineering known ESEs into different positions of stable hairpins. We found that as short as 7-bp stem is sufficient to abolish the enhancer activity. When placing ESEs in the loop region, AG-rich ESEs are fully active...
We describe a theoretical unifying framework to express the comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences and encompasses the main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of e...
We prove that the average complexity of the pairwise ordered tree alignment algorithm of Jiang, Wang and Zhang is in O(nm), where n and m stand for the sizes of the two trees, respectively. We show that the same result holds for the average complexity of pairwise comparison of RNA secondary structures, using a set of biologically relevant operation...
Consider a class of decomposable combinatorial structures, using different types of atoms Z=Z1,...,Z|Z|. We address the random generation of such structures with respect to a size n and a targeted distribution in k of its distinguished atoms. We consider two variations on this problem. In the first alternative, the targeted distribution is given by...
In 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algorithms. More importantly, we precisely quantify the hierarchy by giving closed or asymptotic formulas for...
International audience
In 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algorithms. More importantly, we precisely quantify the hierarchy by giving closed or...
Description: VARNA is a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases.
Features: VARNA implements four drawing algorithms, supports input/output using the classic formats dbn, ct, bpseq and RNAML and exports the drawing as five picture f...
This paper describes a set of methods for randomly drawing traces in large models either uniformly among all traces, or with a coverage criterion as target. Classical random walk methods have some drawbacks. In case of irregular topology of the underlying graph, uniform choice of the next state is far from being optimal from a coverage point of vie...
We used a novel graph-based approach to extract RNA tertiary motifs. We cataloged them all and clustered them using an innovative graph similarity measure. We applied our method to three widely studied structures: Haloarcula marismortui 50S (H.m 50S), Escherichia coli 50S (E. coli 50S), and Thermus thermophilus 16S (T.th 16S) RNAs. We identified 10...
We study the following problem: given a biological sequence S, a multiset M of motifs and an integer k, generate uniformly random sequences which contain the given motifs and have exactly the same frequencies of occurrence of k-lets (i.e. factors of length k) of S. We notably prove that the problem of deciding whether a sequence respects the given...
We used a novel graph-based approach to identify recurrent RNA tertiary mo-tifs embedded within secondary structure. We catalogued all the secondary structural elements of the RNA molecule and clustered them using an innovative graph similarity measure. We applied our method to three widely studied structures: H.m 50S, E.coli 50S and T.th 16S. We i...
In the last ten years, several tools have been proposed for RNA secondary structure pairwise comparison. These tools use different models (ordered tree or forest, arc annotated sequence, multi-level tree) and methods (edit distance, alignment). We present a first benchmark for comparing these tools. For various RNA families, we built two sets of se...
Pyrrolysyl-tRNA synthetase and its cognate suppressor tRNA(Pyl) mediate pyrrolysine (Pyl) insertion at in frame UAG codons. The presence of an RNA hairpin structure named Pyl insertion structure (PYLIS) downstream of the suppression site has been shown to stimulate the insertion of Pyl in archaea. We study here the impact of the presence of PYLIS o...
This paper presents some first results on how to perform uni- form random walks (where every trace has the same proba- bility to occur) in very large models. The models considered here are described in a succinct way as a set of communi- cating reactive modules. The method relies upon techniques for counting and drawing uniformly at random words in...
GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes
of models useful for sequence analysis, such as Markov chains, hidden Markov models, weighted context-free grammars, regular
expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free...
Pairwise sequence alignments aim to decide whether two sequences are related and, if so, to exhibit their related domains. Recent works have pointed out that a significant number of true homologous sequences are missed when using classical comparison algorithms. This is the case when two homologous sequences share several little blocks of homology,...
This paper addresses the problem of selecting finite test sets and automating this selection. Among these methods, some are deterministic and some are statistical. The kind of statistical testing we consider has been inspired by the work of Thevenod-Fosse and Waeselynck. There, the choice of the distribution on the input domain is guided by the str...
International audience
Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of...
Bousquet-M'elou and Conway in [3] found algebraic equations for the area generating function of directed animals on an infinite family of regular, non planar, two-dimensional lattices by using equivalences with hard particle models. We give in this paper a bijective proof of their results which is a generalization of Viennot's heaps of pieces [6, 8...