Nicola Soranzo

Nicola Soranzo
Earlham Institute | TGAC · Data Infrastructure & Algorithms

PhD

About

63
Publications
12,076
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,509
Citations
Introduction
Nicola Soranzo currently works at the Earlham Institute as Galaxy Platform Development officer. Nicola does research in Genetics, Bioinformatics and Systems Biology.
Additional affiliations
September 2009 - December 2014
CRS4 Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
Position
  • Researcher
Education
November 2005 - October 2009
Scuola Internazionale Superiore di Studi Avanzati di Trieste
Field of study
  • Functional and Structural Genomics
October 1997 - July 2005
University of Udine
Field of study
  • Computer Science

Publications

Publications (63)
Preprint
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analys...
Article
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galax...
Article
Full-text available
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely acc...
Preprint
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For over a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. In order to streamline the process...
Preprint
Full-text available
Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap,...
Chapter
Full-text available
A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based R...
Article
Full-text available
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more...
Preprint
Full-text available
Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to m...
Preprint
Full-text available
Background The vast ecosystem of single-cell RNA-seq tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more toward...
Article
Full-text available
Background: It is not a trivial step to move from single-cell RNA-sequencing (scRNA-seq) data production to data analysis. There is a lack of intuitive training materials and easy-to-use analysis tools, and researchers can find it difficult to master the basics of scRNA-seq quality control and the later analysis. Results: We have developed a ran...
Preprint
Full-text available
Background It is not a trivial step to move from single-cell RNA-seq (scRNA-seq) data production to data analysis. There is a lack of intuitive training materials and easy-to-use analysis tools, and researchers can find it difficult to master the basics of scRNA-seq quality control and analysis. Results We have developed a range of easy-to-use scr...
Article
Full-text available
Background Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identific...
Article
Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains d...
Article
The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a communit...
Article
Full-text available
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three k...
Preprint
Full-text available
The primary problem with the explosion of biomedical datasets is not the data itself, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a c...
Article
Full-text available
Background Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancest...
Preprint
Full-text available
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 co...
Presentation
The phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of gene families and plays a vital role in finding ancestral gene duplication events as well as identifying regions that are under positive selection within species. The Ensembl GeneTrees pipeline generates gene trees based on coding sequen...
Poster
The phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of gene families and plays a vital role in finding ancestral gene duplication events as well as identifying regions that are under positive selection within species. The Ensembl GeneTrees pipeline generates gene trees based on coding sequen...
Article
Full-text available
Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work, and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinfor...
Article
Background Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestr...
Article
Full-text available
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has le...
Poster
GeneSeqToFamily is an open-source Galaxy workflow based on the Ensembl GeneTrees pipeline. The workflow helps users to run their analyses without using the command-line while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. It also allows users to subsequently visualise these gene families us...
Poster
The study of homologous genes enables the tracing back of conserved functionality through evolution and finds relationships among species. There are many tools to visualise syntenic information among species, representing gene order and orientation, but they do not provide details about structural diversity within genes and between gene families. A...
Code
Command-line utilities to assist in developing tools for the Galaxy Project. http://galaxyproject.org
Article
Full-text available
Obesity is linked to type 2 diabetes (T2D) and cardiovascular diseases; however, the underlying molecular mechanisms remain unclear. We aimed to identify obesity-associated molecular features that may contribute to obesity-related diseases. Using circulating monocytes from 1,264 Multi-Ethnic Study of Atherosclerosis participants, we quantified the...
Article
Full-text available
The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural...
Preprint
Full-text available
Background: The NCBI BLAST suite has become ubiquitous in modern molecular biology, used for small tasks like checking capillary sequencing results of single PCR products through to genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST was a natura...
Article
Full-text available
Background: Transcriptomic studies hold great potential towards understanding the human aging process. Previous transcriptomic studies have identified many genes with age-associated expression levels; however, small samples sizes and mixed cell types often make these results difficult to interpret. Results: Using transcriptomic profiles in CD14+...
Conference Paper
Full-text available
In this work we present a strategy to integrate Hadoop-based applications into the Galaxy platform along with an extensible implementation of this adapter and related utilities. The strategy is based on the idea of introducing a new Galaxy datatype that provides a layer of indirection, thus relaxing the requirement to place data on a Galaxy-accessi...
Article
Full-text available
BioBlend.objects is a new component of the BioBlend package, adding an object-oriented interface for the Galaxy REST-based application programming interface. It improves support for metacomputing on Galaxy entities by providing higher-level functionality and allowing users to more easily create programs to explore, query and create Galaxy datasets...
Article
Full-text available
End-to-end NGS microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult due to a lack of interoperability, reproducibility, and tr...
Article
Full-text available
Accurate estimation of parameters of biochemical models is required to characterize the dynamics of molecular processes. This problem is intimately linked to identifying the most informative experiments for accomplishing such tasks. While significant progress has been made, effective experimental strategies for parameter identification and for dist...
Article
In this chapter, the in silico systems genetics dataset, used as a benchmark in the rest of the book, is described in detail, in particular regarding its simulation by SysGenSIM. Morever, the algorithms underlying the generation of the gene expression data and the genotype values are fully illustrated.
Conference Paper
As the rate of samples to process increases, manually performing and tracking operations becomes increasingly difficult, costly and error-prone, while processing the massive amounts of data poses significant computational challenges. We will present how combining scientific workflow applications (Galaxy) with state-of-the-art processing technologie...
Article
Full-text available
Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico micr...
Article
Full-text available
Given a large-scale biological network represented as an influence graph, in this article we investigate possible decompositions of the network aimed at highlighting specific dynamical properties. The first decomposition we study consists in finding a maximal directed acyclic subgraph of the network, which dynamically corresponds to searching for a...
Article
Full-text available
SysGenSIM is a software package to simulate Systems Genetics (SG) experiments in model organisms, for the purpose of evaluating and comparing statistical and computational methods and their implementations for analyses of SG data [e.g. methods for expression quantitative trait loci (eQTL) mapping and network inference]. SysGenSIM allows the user to...
Article
Full-text available
Reverse-engineering gene networks from expression profiles is a difficult problem for which a multitude of techniques have been developed over the last decade. The yearly organized DREAM challenges allow for a fair evaluation and unbiased comparison of these methods. We propose an inference algorithm that combines confidence matrices, computed as t...
Article
Full-text available
In this paper we propose three different graph-theoretical decompositions of large-scale biologi-cal networks, all three aiming at highlighting specific dynamical properties of the system. The first consists in finding a maximal directed acyclic subgraph in the network, which dynamically cor-responds to searching for the maximal open-loop subsystem...
Article
Full-text available
The authors use ideas from graph theory in order to determine how distant is a given biological network from being monotone. On the signed graph representing the system, the minimal number of sign inconsistencies (i.e. the distance to monotonicity) is shown to be equal to the minimal number of fundamental cycles having a negative sign. Suitable ope...
Article
Full-text available
ERNEST Reaction Network Equilibria Study Toolbox is a MATLAB package which, by checking various different criteria on the structure of a chemical reaction network, can exclude the multistationarity of the corresponding reaction system. The results obtained are independent of the rate constants of the reactions, and can be used for model discriminat...
Article
Full-text available
In yeast, genome-wide periodic patterns associated with energy-metabolic oscillations have been shown recently for both short (approx. 40 min) and long (approx. 300 min) periods. The dynamical regulation due to mRNA stability is found to be an important aspect of the genome-wide coordination of the long-period yeast metabolic cycle. It is shown tha...
Conference Paper
Full-text available
The gene expression response of yeast to various types of stresses/perturbations shows a common pattern for the vast majority of genes, characterized by a quick transient peak followed by a return to the basal level (adaptation). In order to model this transient and the consequent adaptation, we use the idea of integral feedback (the integral repre...
Article
Full-text available
The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from compendia of high throughput microarray data has been extensively used in the last few years to deduce/integrate/validate various types of "physical" networks of interactions among genes or gene products. This paper give...
Article
Full-text available
In the past years devising methods for discovering gene regulatory mechanisms at a genome-wide level has become a fundamental topic in the field of systems biology. The aim is to infer gene-gene interactions in an increasingly sophisticated and reliable way through the continuous improvement of reverse engineering algorithms exploiting microarray d...
Conference Paper
The concept of reverse engineering a gene network, i.e., of inferring a genome-wide graph of putative gene-gene interactions from high throughput microarray data has been used ex- tensively in the last years to deduce/integrate/validate various types of \physical" networks of interactions among genes or gene products. This paper investigates which...
Conference Paper
In this work we compare the predictive power of some of the most popular algorithms used for gene network inference, seen as an unsupervised graph learning problem. The data, generated by an artificial model of a gene regulatory network, are taken in different conditions, like at equilibrium or during a time course, and different numbers of samples...
Article
Full-text available
Inferring a gene regulatory network exclusively from microarray expression profiles is a difficult but important task. The aim of this work is to compare the predictive power of some of the most popular algorithms in different conditions (like data taken at equilibrium or time courses) and on both synthetic and real microarray data. We are in parti...

Network

Cited By