Marcel TurcotteUniversity of Ottawa · School of Electrical Engineering and Computer Science
Marcel Turcotte
Ph.D. in Computer Science, Université de Montréal
Always actively looking for new collaborations!
About
48
Publications
24,974
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
959
Citations
Introduction
I completed my Ph.D. at the Université de Montréal, Canada, under the guidance of Guy Lapalme and Robert Cedergren. Following this, I undertook a postdoctoral fellowship at the University of Florida, USA, working with Steven Benner. Subsequently, I relocated to the United Kingdom to contribute to the Biomolecular Modelling Laboratory, headed by Mike Sternberg, at the Imperial Cancer Research Fund. In 2000, I returned to Canada to join the School of Electrical Engineering and Computer Science.
Additional affiliations
January 2016 - present
January 2013 - present
Education
September 1993 - September 1995
Publications
Publications (48)
We previously reported a unique genome with systematically fragmented genes and gene pieces dispersed across numerous circular chromosomes, occurring in mitochondria of diplonemids. Genes are split into up to 12 short fragments (modules), which are separately transcribed and joined in a way that differs from known trans-splicing. Further, cox1 mRNA...
Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to rep...
Background
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alle...
RNA elements that are transcribed but not translated into proteins are called non-coding RNAs (ncRNAs). They play wide-ranging roles in biological processes and disorders. Just like proteins, their structure is often intimately linked to their function. Many examples have been documented where structure is conserved across taxa despite sequence div...
Background
Transcription factors (TFs) bind to different parts of the genome in different types of cells, but it is usually assumed that the inherent DNA-binding preferences of a TF are invariant to cell type. Yet, there are several known examples of TFs that switch their DNA-binding preferences in different cell types, and yet more examples of oth...
Transcription factors (TFs) bind to different parts of the genome in different types of cells. These differences may be due to alterations in the DNA-binding preferences of a TF itself, or mechanisms such as chromatin accessibility, steric hindrance, or competitive binding, that result in a DNA "signature" of differential binding. We propose a meth...
Identifying and monitoring hosts of zoonotic RNA viruses, that is, RNA viruses which can be transmitted from one species to another, including the recent SARS-CoV-2 causing the COVID-19 pandemic, is paramount to control their spread. However, efforts to control such spread may be affected if there are unmonitored or unknown hosts. To help identify...
Non-coding RNAs (ncRNAs) are RNA molecules that do not code for protein, but take part in biological processes, including gene expression. Interestingly, like proteins, they can fold into complex structures to perform their wide array of biological functions. Since the folded structure of a ncRNA may be critical to its function, many studies have a...
Protein secondary structure is crucial to create an information bridge between the primary structure and the tertiary (3D) structure. Precise prediction of 8-state protein secondary structure (PSS) significantly utilized in the structural and functional analysis of proteins in bioinformatics. In this recent period, deep learning techniques have bee...
Motivation
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alle...
– Recent years have witnessed increased interest in reducing student attrition at universities due both to matters of improved student outcomes and the practical fiscal benefits accrued from improved retention. The reason for student attrition can be due to many contributing factors (maturity, motivation, external stresses, etc.) and may be difficu...
This paper describes and evaluates a multi-objective strongly typed genetic programming algorithm for the discovery of network expressions in DNA sequences. Using 13 realistic data sets, we compare the results of our tool, MotifGP, to that of DREME, a state-of-the-art program. MotifGP outperforms DREME when the motifs to be sought are long, and the...
The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangem...
Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here our graph represents all possible RNA structures and interactions and we extract the patterns that are most likely to represent a biological mechanism. The graph model we used is a directed dual graph which is randomly s...
Finding relationships between DNA sequence motifs, such as transcription factor binding sites, is an important step to understand transcription regulation in a particular context. Current computational tools are not well adapted for discovering relationships. We have developed a software system, ModuleInducer, which integrates motif finding with th...
Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of diff...
RNA interactions are fundamental to a multitude of cellular processes including post-transcriptional gene regulation. Although
much progress has been made recently at developing fast algorithms for predicting RNA interactions, much less attention has
been devoted to the development of efficient algorithms and data structures for locating RNA intera...
Background / Purpose:
In the past decade fast advancement has been made in the sequencing, digitalization and collection of the biological data. However the bottleneck remains at the point of analysis and extraction of patterns from the data.We have developed an application that is aimed at widening this bottleneck by automating the knowledge ext...
In the protist Diplonema papillatum (Diplonemea, Euglenozoa), mitochondrial genes are systematically fragmented with each nonoverlapping piece (module) encoded
individually on a distinct circular chromosome. Gene modules are transcribed separately, and precursor transcripts are assembled
to mature mRNA by a trans-splicing process of yet unknown mec...
Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic
and proteomic experiments such as gene expression microarrays. We briefly review ideas behind AEA, identify some limitations
and propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) appro...
A new architecture, called Isometric on-Chip Computer Architecture (ICMA), for massive parallel computing is introduced. This architecture integrates an equal distance organization of processing elements interleaved with memory units. ICMA is inspired by the sodium chloride (NaCl) molecular structure and based on bicolor three-dimensional mesh netw...
Repeated sequences account for a significant fraction of Eukaryotic genomes - nearly half of the human genome consists of repeated-sequence elements. Several elements have been linked to diseases. Consequently, identifying and characterizing repeated elements is essential for understanding diseases at the molecular level. Repeated sequences vary fr...
To quickly and efficiently fold long ribonucleic acid (RNA) sequences, fast computational models are needed. This paper compares
two parallel multiprocessor computer architectures for the prediction of RNA secondary structure. We show promising experimental
results using the OpenMP programming environment. This work is intended to be a testbed for...
Internal ribosome entry sites (IRES) allow ribosomes to be recruited to mRNA in a cap-independent manner. Some viruses that
impair cap-dependent translation initiation utilize IRES to ensure that the viral RNA will efficiently compete for the translation
machinery. IRES are also employed for the translation of a subset of cellular messages during c...
In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.
The researc...
The cell has many ways to regulate the production of proteins. One mechanism is through the changes to the machinery of translation initiation. These alterations favor the translation of one subset of mRNAs over another. It was first shown that internal ribosome entry sites (IRESes) within viral RNA genomes allowed the production of viral proteins...
Recent experimental evidences have shown that ribonucleic acid (RNA) plays a greater role in the cell than previously thought. An ensemble of RNA sequences believed to contain signals at the structure level can be exploited to detect functional motifs common to all or a portion of those sequences. We present here a general framework for analyzing m...
The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process.
We present an algorithm called Seed to identify...
Related RNA sequences believed to contain signals at sequence and structure level can be exploited to detect common motifs. Finding these similar structural features can provide substantial information as to which parts of the sequence are functional. For several decades, free energy minimization has been the most popular method for structure predi...
Repeated sequences account for a significant fraction of Eukaryotic genomes - nearly half of the human genome consists of repeated sequence elements. Several elements have been linked to diseases. Consequently, identifying and characterizing repeated elements is essential for understanding diseases at the molecular level. Repeated sequences vary fr...
Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and
50S ribosomal subunits brought more supporting evidence. Several inference tools are combining free-energy minimisation and
comparative analysis to improve the quality of secondary structure predictions. Using many input seque...
Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and 50S ribosomal subunits bringing more supporting evidence. Several inference tools are combining free energy minimisation and comparative analysis to improve the quality of secondary structure predictions. This paper investigat...
Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and 50S ribosomal subunits brought more supporting evidence. Several inference tools are combining free-energy minimisation and comparative analysis to improve the quality of secondary structure predictions. Using many input seque...
Inductive logic programming (ILP) has been applied to automatically discover protein fold signatures. This paper investigates the use of topological information to circumvent problems encountered during previous experiments, namely (1) matching of non-structurally related secondary structures and (2) scaling problems. Cross-validation tests were ca...
As a form of Machine Learning the study of Inductive Logic Programming (ILP) is motivated by a central belief: relational description languages are better (in terms of accuracy and understandability) than propositional ones for certain real-world applications. This claim is investigated here for a particular application in structural molecular biol...
There are constraints on a protein sequence/structure for it to adopt a particular fold. These constraints could be either a local signature involving particular sequences or arrangements of secondary structure or a global signature involving features along the entire chain. To search systematically for protein fold signatures, we have explored the...
Inductive Logic Programming (ILP) has been applied to learn rules which characterise protein folds. Several representations for the background set have been explored and the results have been interpreted in their biological context. In this paper, we present new results obtained with a background set containing information about protein topology. T...
For the last three decades, understanding protein structure has been and still is a challenging problem for molecular biology. The problem is to identify rules which relate the local structure to the complex three-dimensional fold. There are now classification schemes for some 8000 three-dimensional folds. To gain further insights into protein stru...
. Inductive Logic Programming (ILP) has been applied to discover rules governing the three-dimensional topology of protein structure. The data-set unifies two sources of information; SCOP and PROMOTIF. Cross-validation results for experiments using two background knowledge sets, global (attribute-valued) and constitutional (relational), are present...
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 100 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
A secondary structure has been predicted for the heat shock protein HSP90 family from an aligned set of homologous protein sequences by using a transparent method in both manual and automated implementation that extracts conformational information from patterns of variation and conservation within the family. No statistically significant sequence s...
. This paper describes and evaluates a parallel program for determining the threedimensional structure of nucleic acids. A parallel constraint satisfaction algorithm is used to search a discrete space of shapes. Using two realistic data sets, we compare a previous sequential version of the program written in Miranda to the new sequential and parall...
This paper presents an application of functional programming in the field of molecular biology: exploring the conformations of nucleic acids. The Nucleic Acid three-dimensional structure determination problem (NA3D) and a constraint satisfaction algorithm are formally described. Prototyping and experimental development using the Miranda functional...
Motivated by an application in molecular biology, the prediction of biopolymer three-dimensionalstructures, an appropriate polymorphic tree search control structure has been implemented using a functionalprogramming language to evaluate different tree search approaches to solve discrete combinatorialproblems in three-dimensional space. The control...
Three-dimensional (3-D) structural models of RNA are essential for understanding of the cellular roles played by RNA. Such models have been obtained by a technique based on a constraint satisfaction algorithm that allows for the facile incorporation of secondary and other structural information. The program generates 3-D structures of RNA with atom...
In the last few years, we have seen a rapid increase of the number of known RNA families. For a significant fraction of them, the mechanisms of action remain unclear. Their signature combines structure and sequence information. In most cases, they are difficult to identify from sequence alone. Traditional approaches to identify RNA motifs seek to f...