
Nelle VaroquauxUniversity of California, Berkeley | UCB
Nelle Varoquaux
About
61
Publications
13,248
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,713
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (61)
Motivation
We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum like...
Free and open‐source software projects have become essential digital infrastructure over the past decade. These projects are largely created and maintained by unpaid volunteers, presenting a potential vulnerability if the projects cannot recruit and retain new volunteers. At the same time, their development on open collaborative development platfor...
The shifts in adaptive strategies revealed by ecological succession and the mechanisms that facilitate these shifts are fundamental to ecology. These adaptive strategies could be particularly important in communities of arbuscular mycorrhizal fungi (AMF) mutualistic with sorghum where strong AMF succession replaces initially ruderal species with co...
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g., contact laws)...
Renewable fuels are needed to replace fossil fuels in the immediate future. Lignocellulosic bioenergy crops provide a renewable alternative that sequesters atmospheric carbon. To prevent displacement of food crops, it would be advantageous to grow biofuel crops on marginal lands. These lands will likely face more frequent and extreme drought condit...
Bacteria of the genus Streptomyces are prolific producers of specialized metabolites, including antibiotics. The linear chromosome includes a central region harboring core genes, as well as extremities enriched in specialized metabolite biosynthetic gene clusters. Here, we show that chromosome structure in Streptomyces ambofaciens correlates with g...
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target-decoy competition procedure, classically...
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target–decoy competition procedure, classically...
Arbuscular mycorrhizal fungi (AMF), the mutualistic symbionts with most crops, constitute a research system of human-associated fungi whose relative simplicity and synchrony are conducive to experimental ecology. However, little is known about the shifts in adaptive strategies of sorghum associated AMFs where strong AMF succession replaces initiall...
Recent studies have demonstrated that drought leads to dramatic, highly conserved shifts in the root microbiome. At present, the molecular mechanisms underlying these responses remain largely uncharacterized. Here we employ genome-resolved metagenomics and comparative genomics to demonstrate that carbohydrate and secondary metabolite transport func...
We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood appr...
Streptomyces are among the most prolific bacterial producers of specialized metabolites, including antibiotics. The linear chromosome is partitioned into a central region harboring core genes and two extremities enriched in specialized metabolite biosynthetic gene clusters (SMBGCs). The molecular mechanisms governing structure and function of these...
Streptomyces are among the most prolific bacterial producers of specialized metabolites, including antibiotics. The linear chromosome is partitioned into a central region harboring core genes and two extremities enriched in specialized metabolite biosynthetic gene clusters (SMBGCs). The molecular mechanisms governing structure and function of these...
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g. contact laws) a...
Genome wide contact frequencies obtained using Hi-C-like experiments have raised novel challenges in terms of visualization and rationalization of chromosome structuring phenomena. In bacteria, display of Hi-C data should be congruent with the circularity of chromosomes. However, standard representations under the form of square matrices or horizon...
Publisher: Cold Spring Harbor Laboratory Section: New Results
The 3D organization of the genome plays a key role in many cellular processes, such as gene regulation, differentiation, and replication. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. For example, structural inference can account for...
There are many recommendations of "best practices" for those doing data science, data-intensive research, and research in general. These documents usually present a particular vision of how people should work with data and computing, recommending specific tools, activities, mechanisms, and sensibilities. However, implementation of best (or better)...
Turnover is a fact of life for any project, and academic research teams can face particularly high levels of people who come and go through the duration of a project. In this article, we discuss the challenges of turnover and some potential practices for helping manage it, particularly for computational- and data-intensive research teams and projec...
What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss inclu...
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) “libraries” – curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to us...
What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, includ...
Background:
Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a mat...
The development of new ways to probe samples for the three-dimensional (3D) structure of DNA paves the way for in depth and systematic analyses of the genome architecture. 3C-like methods coupled with high-throughput sequencing can now assess physical interactions between pairs of loci in a genome-wide fashion, thus enabling the creation of genome-...
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to u...
The development of malaria parasites throughout their various life cycle stages is coordinated by changes in gene expression. We previously showed that the three-dimensional organization of the
Plasmodium falciparum genome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyze genome organi...
The development of malaria parasites throughout their various life cycle stages is controlled by coordinated changes in gene expression. We previously showed that the three-dimensional organization of the P. falciparum genome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyzed genome or...
Clustering procedures typically estimate which data points are clustered together, a quantity of primary importance in many analyses. Often used as a preliminary step for dimensionality reduction or to facilitate interpretation, finding robust and stable clusters is often crucial for appropriate for downstream analysis. In the present work, we cons...
Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data. Chromosome conformation data, such as Hi-C, is not different. The most widely used type of normalization of Hi-C data casts estimations of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact as...
The 8th edition of the European Conference on Python in Science, EuroSciPy was held for the second time in the beautiful city of Cambridge, UK from August, 26th to 29th, 2014. More than 200 participants, both from academia and industry, attended the conference. As usual, the conference kicked off with two days of tutorials, divided into an introduc...
The structure of DNA, chromosomes and genome organization is a topic that has fascinated the field of biology for many years. Most research focused on the one-dimensional structure of the genome, studying the linear organizations of genes and genomes and their link with gene expression and regulation, splicing, DNA methylation… Yet, spatial and tem...
Several recently developed experimental methods, each an extension of the chromatin conformation capture (3C) assay, have enabled the genome-wide profiling of chromatin contacts between pairs of genomic loci in 3D. Especially in complex eukaryotes, data generated by these methods, coupled with other genome-wide datasets, demonstrated that non-rando...
HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-effici...
Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolutio...
Plasmodium falciparum is the most deadly human malarial parasite, responsible for an estimated 207 million cases of disease and 627,000 deaths in 2012. Recent studies reveal that the parasite actively regulates a large fraction of its genes throughout its replicative cycle inside human red blood cells and that epigenetics plays an important role in...
Motivation:
Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inferen...
Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dial...
These are the proceedings of the 6th European Conference on Python in
Science, EuroSciPy 2013, that was held in Brussels (21-25 August 2013).
The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome archite...
Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing infer...
A critical component of the learning process lies in the feedback that students receive on their work that validates their progress, identifies flaws in their thinking, and identifies skills that still need to be learned. Many higher-education institutions have developed an active pedagogy that gives students opportunities for different forms of as...