Guillaume Rizk

Guillaume Rizk
Institute for Research in IT and Random Systems | IRISA · Genscale

About

48
Publications
12,565
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,968
Citations
Introduction
Skills and Expertise

Publications

Publications (48)
Article
Full-text available
The recent availability of the first commercial quantum computers has provided a promising tool to tackle NP hard problems which can only be solved heuristically with present techniques. However, it is unclear if the current state of quantum computing already provides a quantum advantage over the current state of the art in classical computing. Thi...
Article
Full-text available
This paper assesses the performance of the D-Wave 2X (DW) quantum annealer for finding a maximum clique in a graph, one of the most fundamental and important NP-hard problems. Because the size of the largest graphs DW can directly solve is quite small (usually around 45 vertices), we also consider decomposition algorithms intended for larger graphs...
Article
Full-text available
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic dat...
Chapter
This chapter deals with the compression of genomic data without reference genomes. It presents various techniques which have been specifically developed to compress sequencing data in lossless or lossy modes. The chapter also provides an evaluation of different NGS data compressor tools.
Preprint
Full-text available
In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI)...
Conference Paper
This paper assesses the performance of the D-Wave 2X (DW) quantum annealer for finding a maximum clique in a graph, one of the most fundamental and important NP-hard problems. Because the size of the largest graphs DW can directly solve is quite small (usually around 45 vertices), we also consider decomposition algorithms intended for larger graphs...
Article
Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process, due to high construction time, RAM or external memory usage. We revisit a simple algorithm and show that it i...
Article
Full-text available
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNA...
Preprint
Full-text available
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. Among the plethora of reconstructed transcripts, one of the main bottlenecks consists in correctly identifying the different classes of RNAs, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNA...
Article
Full-text available
Background: Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. Results: We present a novel reference-free method meant to compress data...
Article
Full-text available
Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundament...
Article
Full-text available
Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods...
Article
Full-text available
Motivation: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by the NGS machines. A serious bottleneck can be the design of such algorithms, as they require sophisticated data structures and advanced hardware implementation.Results: We propose an open-source library dedicated to...
Article
Full-text available
The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e.g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (¿ 30 GB). We propose a new encoding of the de Bruijn graph...
Conference Paper
Full-text available
Background / Purpose: The MINIA software was developed to drastically reduce the memory footprint needed for genome assembly, enabling human genome to be assembled on a desktop computer (see Chikhi and Rizk 2012). Main conclusion: This work shows that the genome assembly program MINIA is able to assemble a 100 Mbp genome on a Raspberry Pi, a v...
Article
Unmapped reads are often discarded from the analysis of whole genome re-sequencing, while, opposingly, new biological information can be discovered from their analysis. In this paper , we investigated these reads from the re-sequencing data of thirty-three aphid genomes. The unmapped reads for each individual were retrieved from the results of the...
Article
Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algor...
Article
Bioinformatics require the analysis of large amounts of data. With the recent advent of next generation sequencing technologies generating data at a cheap cost, the computational power needed has increased dramatically. Graphic Processing Units (GPU) are now programmable beyond simple graphic computations, providing cheap high performance for gener...
Article
Full-text available
The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for lar...
Article
Full-text available
Small non-coding RNAs (sncRNAs) have been abundantly described as strongly implicated in the post-transcriptional regulation of transcribed units in eukaryote genomes. Silencing of genes, pseudogenes or transposable elements can be operated in the germline or in the soma through microRNA (miRNAs), PIWI associated RNAs (piRNAs) or endogenous short i...
Article
Full-text available
Post-transcriptional regulation in eukaryotes can be operated through microRNA (miRNAs) mediated gene silencing. MiRNAs are small (18-25 nucleotides) non-coding RNAs that play crucial role in regulation of gene expression in eukaryotes. In insects, miRNAs have been shown to be involved in multiple mechanisms such as embryonic development, tissue di...
Data
A. pisum microRNAs. list of predicted miRNAs from the pea aphid genome. The table presents the name, sequence of the mature A. pisum predicted miRNA, sequence of the hairpin precursor(s), reference of the genomic scaffold that includes the microRNA within the A. pisum genome, strand sense, method used for identification of the miRNA, clusterisation...
Data
Abundance of A. pisum miRNA candidates and their mir* within the Solexa reads. Abundance of A. pisum miRNA and miRNA candidates and their corresponding mir* within the the Solexa reads. A. pisum predicted microRNAs were designated as miRNAs if they have abundant reads (≥5) and/or their corresponding mir* were identified within the reads, otherwise...
Data
miRBase accession number. miRBase accession number of A. pisum miRNAs.
Data
GR4500 set up. Set of features to discriminate using GR4500 between miRNA and non miRNA hairpins
Data
Sequencing of small RNas from A. pisum parthenogenetic colony. The results of the sequencing of small RNAs from a parthenogenetic colony of A. pisum and of the results of the analysis by microfluidic of the expression of A. pisum microRNAs in various morphs have been deposited in Gene Expression Omnibus (Go). The GO accession numbers are provided
Article
Full-text available
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Full-text available
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Full-text available
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Full-text available
Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetab...
Article
Full-text available
Post-transcriptional regulation in eucaryotes can be operated through microRNA (miRNAs) mediated gene silencing. MiRNAs are small (18-25 nucleotides) non-coding RNAs that play crucial role in regulation of gene expression in eukaryotes. In insects, miRNAs have been shown to be involved in multiple mechanisms such as embryonic development, tissue di...
Conference Paper
Full-text available
Many bioinformatics studies require the analysis of RNA or DNA structures. More specifically, extensive work is done to elaborate efficient algorithms able to predict the 2-D folding structures of RNA or DNA sequences. However, the high computational complexity of the algorithms, combined with the rapid increase of genomic data, triggers the need...
Article
Full-text available
We propose a new splitting approach to extend the decision trees to temporal data. The proposed split aims to determine for each daughter node the representative time series, the observation period best discriminating the output variable, and the optimal contribution of the values and of the behavior for the proximity evaluation. A new extension of...

Network

Cited By