Nelle Varoquaux

Nelle Varoquaux
University of California, Berkeley | UCB

About

61
Publications
13,248
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,713
Citations
Citations since 2017
44 Research Items
2435 Citations
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500
20172018201920202021202220230100200300400500

Publications

Publications (61)
Article
Full-text available
Motivation We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum like...
Article
Full-text available
Free and open‐source software projects have become essential digital infrastructure over the past decade. These projects are largely created and maintained by unpaid volunteers, presenting a potential vulnerability if the projects cannot recruit and retain new volunteers. At the same time, their development on open collaborative development platfor...
Article
The shifts in adaptive strategies revealed by ecological succession and the mechanisms that facilitate these shifts are fundamental to ecology. These adaptive strategies could be particularly important in communities of arbuscular mycorrhizal fungi (AMF) mutualistic with sorghum where strong AMF succession replaces initially ruderal species with co...
Chapter
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g., contact laws)...
Article
Full-text available
Renewable fuels are needed to replace fossil fuels in the immediate future. Lignocellulosic bioenergy crops provide a renewable alternative that sequesters atmospheric carbon. To prevent displacement of food crops, it would be advantageous to grow biofuel crops on marginal lands. These lands will likely face more frequent and extreme drought condit...
Article
Full-text available
Bacteria of the genus Streptomyces are prolific producers of specialized metabolites, including antibiotics. The linear chromosome includes a central region harboring core genes, as well as extremities enriched in specialized metabolite biosynthetic gene clusters. Here, we show that chromosome structure in Streptomyces ambofaciens correlates with g...
Preprint
Full-text available
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target-decoy competition procedure, classically...
Chapter
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target–decoy competition procedure, classically...
Preprint
Arbuscular mycorrhizal fungi (AMF), the mutualistic symbionts with most crops, constitute a research system of human-associated fungi whose relative simplicity and synchrony are conducive to experimental ecology. However, little is known about the shifts in adaptive strategies of sorghum associated AMFs where strong AMF succession replaces initiall...
Article
Full-text available
Recent studies have demonstrated that drought leads to dramatic, highly conserved shifts in the root microbiome. At present, the molecular mechanisms underlying these responses remain largely uncharacterized. Here we employ genome-resolved metagenomics and comparative genomics to demonstrate that carbohydrate and secondary metabolite transport func...
Preprint
Full-text available
We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two step algorithm: first convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood appr...
Preprint
Full-text available
Streptomyces are among the most prolific bacterial producers of specialized metabolites, including antibiotics. The linear chromosome is partitioned into a central region harboring core genes and two extremities enriched in specialized metabolite biosynthetic gene clusters (SMBGCs). The molecular mechanisms governing structure and function of these...
Preprint
Full-text available
Streptomyces are among the most prolific bacterial producers of specialized metabolites, including antibiotics. The linear chromosome is partitioned into a central region harboring core genes and two extremities enriched in specialized metabolite biosynthetic gene clusters (SMBGCs). The molecular mechanisms governing structure and function of these...
Preprint
Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework aiming at capturing and quantifying these properties. These include both standard tools (e.g. contact laws) a...
Preprint
Full-text available
Genome wide contact frequencies obtained using Hi-C-like experiments have raised novel challenges in terms of visualization and rationalization of chromosome structuring phenomena. In bacteria, display of Hi-C data should be congruent with the circularity of chromosomes. However, standard representations under the form of square matrices or horizon...
Preprint
Full-text available
The 3D organization of the genome plays a key role in many cellular processes, such as gene regulation, differentiation, and replication. Assays like Hi-C measure DNA-DNA contacts in a high-throughput fashion, and inferring accurate 3D models of chromosomes can yield insights hidden in the raw data. For example, structural inference can account for...
Preprint
Full-text available
There are many recommendations of "best practices" for those doing data science, data-intensive research, and research in general. These documents usually present a particular vision of how people should work with data and computing, recommending specific tools, activities, mechanisms, and sensibilities. However, implementation of best (or better)...
Preprint
Turnover is a fact of life for any project, and academic research teams can face particularly high levels of people who come and go through the duration of a project. In this article, we discuss the challenges of turnover and some potential practices for helping manage it, particularly for computational- and data-intensive research teams and projec...
Preprint
Full-text available
What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss inclu...
Article
Full-text available
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) “libraries” – curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to us...
Preprint
What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, includ...
Article
Full-text available
Background: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a mat...
Article
Full-text available
The development of new ways to probe samples for the three-dimensional (3D) structure of DNA paves the way for in depth and systematic analyses of the genome architecture. 3C-like methods coupled with high-throughput sequencing can now assess physical interactions between pairs of loci in a genome-wide fashion, thus enabling the creation of genome-...
Preprint
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to u...
Article
Full-text available
The development of malaria parasites throughout their various life cycle stages is coordinated by changes in gene expression. We previously showed that the three-dimensional organization of the Plasmodium falciparum genome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyze genome organi...
Preprint
Full-text available
The development of malaria parasites throughout their various life cycle stages is controlled by coordinated changes in gene expression. We previously showed that the three-dimensional organization of the P. falciparum genome is strongly associated with gene expression during its replication cycle inside red blood cells. Here, we analyzed genome or...
Article
Clustering procedures typically estimate which data points are clustered together, a quantity of primary importance in many analyses. Often used as a preliminary step for dimensionality reduction or to facilitate interpretation, finding robust and stable clusters is often crucial for appropriate for downstream analysis. In the present work, we cons...
Preprint
Full-text available
Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data. Chromosome conformation data, such as Hi-C, is not different. The most widely used type of normalization of Hi-C data casts estimations of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact as...
Article
The 8th edition of the European Conference on Python in Science, EuroSciPy was held for the second time in the beautiful city of Cambridge, UK from August, 26th to 29th, 2014. More than 200 participants, both from academia and industry, attended the conference. As usual, the conference kicked off with two days of tutorials, divided into an introduc...
Thesis
The structure of DNA, chromosomes and genome organization is a topic that has fascinated the field of biology for many years. Most research focused on the one-dimensional structure of the genome, studying the linear organizations of genes and genomes and their link with gene expression and regulation, splicing, DNA methylation… Yet, spatial and tem...
Article
Full-text available
Several recently developed experimental methods, each an extension of the chromatin conformation capture (3C) assay, have enabled the genome-wide profiling of chromatin contacts between pairs of genomic loci in 3D. Especially in complex eukaryotes, data generated by these methods, coupled with other genome-wide datasets, demonstrated that non-rando...
Article
Full-text available
HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-effici...
Article
Full-text available
Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolutio...
Article
Plasmodium falciparum is the most deadly human malarial parasite, responsible for an estimated 207 million cases of disease and 627,000 deaths in 2012. Recent studies reveal that the parasite actively regulates a large fraction of its genes throughout its replicative cycle inside human red blood cells and that epigenetics plays an important role in...
Article
Full-text available
Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inferen...
Article
Full-text available
Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dial...
Article
These are the proceedings of the 6th European Conference on Python in Science, EuroSciPy 2013, that was held in Brussels (21-25 August 2013).
Article
Full-text available
The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome archite...
Article
Full-text available
Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing infer...
Article
Full-text available
A critical component of the learning process lies in the feedback that students receive on their work that validates their progress, identifies flaws in their thinking, and identifies skills that still need to be learned. Many higher-education institutions have developed an active pedagogy that gives students opportunities for different forms of as...

Network

Cited By