Figure 1
Bioconda development and usage since the beginning of the project. (a) contributing authors and added recipes over time. (b) code line additions and deletions per week. (c) package count per language ecosystem (saturated colors on bottom represent explicitly life science related packages). (d) total downloads per language ecosystem. The term "other" entails all recipes that do not fall into one of the specific categories. Note that a subset of packages that started in Bioconda have since been migrated to the more appropriate, general-purpose conda-forge channel. Older versions of such packages still reside in the Bioconda channel, and as such are included in the recipe count (a) and download count (d). Statistics obtained Oct. 25, 2017.
Source publication
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 co...
Contexts in source publication
Context 1
... copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/207092 doi: bioRxiv preprint first posted online Oct. 21, 2017; linearly, on average, with no sign of saturation (Fig. 1a,b). The barrier to entry is low, requiring a willing- ness to participate and adherence to community guidelines. Many software developers contribute recipes for their own tools, and many Bioconda contributors are invested in the project as they are also users of Conda and Bioconda. Bioconda provides packages from various language ...
Context 2
... community guidelines. Many software developers contribute recipes for their own tools, and many Bioconda contributors are invested in the project as they are also users of Conda and Bioconda. Bioconda provides packages from various language ecosystems like Python, R (CRAN and Bioconductor), Perl, Haskell, as well as a plethora of C/C++ programs (Fig. 1c). Many of these packages have complex dependency structures that require various manual steps to install when not relying on a package manager like Conda (Fig. 2a, Online Methods). With over 6.3 million downloads, the service has become a backbone of bioinformatics infrastructure (Fig. 1d). Bioconda is complemented by the conda-forge ...
Context 3
... Perl, Haskell, as well as a plethora of C/C++ programs (Fig. 1c). Many of these packages have complex dependency structures that require various manual steps to install when not relying on a package manager like Conda (Fig. 2a, Online Methods). With over 6.3 million downloads, the service has become a backbone of bioinformatics infrastructure (Fig. 1d). Bioconda is complemented by the conda-forge project (https://conda-forge.github.io), which hosts software not specifically related to the biological sciences. The two projects collaborate closely, and the Bioconda team maintains over 500 packages hosted by conda-forge. Among all currently available distributions of bioinformatics ...
Citations
... For overall adoption of software solutions it is important the tools and 310 documentation get packaged by software distributions, such as Bioconda [43], ...
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies — as well as in somatic and germline mutation studies. VCF can present single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called against a reference genome. Here we present over 125 useful and much used free and open source software tools and libraries, part of vcflib tools and bio-vcf . We also highlight cyvcf2 , hts-nim and slivar tools. Application is typically in the comparison, filtering, normalisation, smoothing, annotation, statistics, visualisation and exporting of variants. Our tools run daily and invisibly in pipelines and countless shell scripts. Our tools are part of a wider bioinformatics ecosystem and we consider it very important to make these tools available as free and open source software to all bioinformaticians so they can be deployed through software distributions, such as Debian, GNU Guix and Bioconda. vcflib , for example, was installed over 40,000 times and bio-vcf was installed over 15,000 times through Bioconda by December 2020. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation that can not easily be represented by the VCF format. All source code is published under free and open source software licenses and can be downloaded and installed from https://github.com/vcflib .
Author summary
Most bioinformatics workflows deal with DNA/RNA variations that are typically represented in the variant call format (VCF) — a file format that describes mutations (SNP and MNP), insertions and deletions (INDEL) against a reference genome. Here we present a wide range of free and open source software tools that are used in biomedical sequencing workflows around the world today.
... Data processing tasks were organized into a Snakemake (Koster and Rahmann, 2012) workflow with the help of the hundo package (Brown et al., 2018). Versioned executables were downloaded during runtime using Bioconda (Grüning et al., 2017). ...
Riverbeds are hotspots for microbially-mediated reactions that exhibit pronounced variability in space and time. It is challenging to resolve biogeochemical mechanisms in natural riverbeds, as uncontrolled settings complicate data collection and interpretation. To overcome these challenges, laboratory flumes are often used as proxies for natural riverbed systems. Flumes capture spatiotemporal variability and thus allow for controlled investigations of riverbed biogeochemistry. These investigations implicitly rely on the assumption that the flume microbiome is similar to the microbiome of natural riverbeds. However, this assumption has not been tested and it is unknown how the microbiome of a flume compares to natural aquatic settings, including riverbeds. To evaluate the fundamental assumption that a flume hosts a microbiome similar to natural riverbed systems, we used 16s rRNA gene sequencing and publicly available data to compare the sediment microbiome of a single large laboratory flume to a wide variety of natural ecosystems including lake and marine sediments, river, lake, hyporheic, soil, and marine water, and bank and wetland soils. Richness and Shannon diversity metrics, analyses of variance, Bray-Curtis dissimilarity, and analysis of the common microbiomes between flume and river sediment all indicated that the flume microbiome more closely resembled natural riverbed sediments than other ecosystems, supporting the use of flume experiments for investigating natural microbially-mediated biogeochemical processes in riverbeds.
... MetaWRAP is hosted on github (https://github.com/bxlab/metaWRAP), distributed through Anaconda [33], and can be easily installed locally and on remote clusters. The metawrap-mg conda package (https:// anaconda.org/ursky/metawrap-mg) ...
Background:
The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of uncultivated microbial populations that may have important roles in their environments. Extracting individual draft genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis have become diverse and sophisticated, resulting in a significant burden for biologists to access and use them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their evaluation and visualization.
Results:
To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly. MetaWRAP's hybrid bin extraction algorithm outperforms individual binning approaches and other bin consolidation programs in both synthetic and real data sets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including taxonomy assignment, abundance estimation, functional annotation, and visualization.
Conclusions:
MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic analysis, while contributing significant improvements to the extraction and interpretation of high-quality metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software available at https://github.com/bxlab/metaWRAP .
... NGLess is available as open source software at http://github.com/ngless-toolkit/ngless . Additionally, NGLess is available as a bioconda package 46 and in container form (through biocontainers 47 ). Documentation and tutorials can be found at http://ngless.embl.de . ...
NGLess is a domain specific language for describing next-generation sequence processing pipelines. It was developed with the goal of enabling user-friendly computational reproducibility.
Using this framework, we developed NG-meta-profiler , a fast profiler for metagenomes which performs sequence preprocessing, mapping to bundled databases, filtering of the mapping results, and profiling (taxonomic and functional). It is significantly faster than either MOCAT2 or htseq-count and (as it builds on NGLess) its results are perfectly reproducible. These pipelines can easily be customized and extended with other tools.
NGLess and NG-meta-profiler are open source software (under the liberal MIT licence) and can be downloaded from http://ngless.embl.de or installed through bioconda.
... management platform (Fig. 2). When available we used BioConda (Grüning et al., 2017) to install the open source software. Summary statistics of the original input and subsequent downstream results files were collected at each step of the pipeline. ...
Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and application programming interface, focused on Staphylococcus aureus , a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. Written in Python, Staphopia’s analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100× chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public European Nucleotide Archive database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top 10 STs made up 70% of all genomes. methicillin-resistant S. aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus ST, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases.
... Leveraging Conda, Bioconda (https://bioconda.github.io) is a community project dedicated to data analysis in life sciences that contains over 4,000 tool packages with contributions by more than 400 authors . Despite the fact that Bioconda is one of the most recent package managers dedicated to biomedical tools, it contains by far the largest number of software tools, underscoring its rapid uptake by the community (see Figure 2 in Grüning et al., 2017). Bioconda packages are well maintained and include a testing system to ensure their quality. ...
Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.
... PHASTER (PHAge Search Tool Enhanced Release) (76) was used to search for potential phage insert in the genome assembly. All genomic analyses were performed using Snakemake (77) as workflow manager together with software installations from Bioconda (78). Reads and assemblies for Δ10KG and Δ10LVM have been deposited in the NCBI BioProject Repository (PRJNA454100). ...
Persistence is a reversible and low-frequency phenomenon allowing a subpopulation of a clonal bacterial population to survive antibiotic treatments. Upon removal of the antibiotic, persister cells resume growth and give rise to viable progeny. Type II toxin-antitoxin (TA) systems were assumed to play a key role in the formation of persister cells in Escherichia coli based on the observation that successive deletions of TA systems decreased persistence frequency. In addition, the model proposed that stochastic fluctuations of (p)ppGpp levels are the basis for triggering activation of TA systems. Cells in which TA systems are activated are thought to enter a dormancy state and therefore survive the antibiotic treatment. Using independently constructed strains and newly designed fluorescent reporters, we reassessed the roles of TA modules in persistence both at the population and single-cell levels. Our data confirm that the deletion of 10 TA systems does not affect persistence to ofloxacin or ampicillin. Moreover, microfluidic experiments performed with a strain reporting the induction of the yefM-yoeB TA system allowed the observation of a small number of type II persister cells that resume growth after removal of ampicillin. However, we were unable to establish a correlation between high fluorescence and persistence, since the fluorescence of persister cells was comparable to that of the bulk of the population and none of the cells showing high fluorescence were able to resume growth upon removal of the antibiotic. Altogether, these data show that there is no direct link between induction of TA systems and persistence to antibiotics.
... The containerization keeps the deployment task to a minimum. The selected Galaxy tools are automatically installed from the Galaxy ToolShed [13] (https://toolshed.g2.bx.psu.edu) using the Galaxy API BioBlend [14] and the installation of the tools and their dependencies are automatically resolved using packages available through Bioconda [15] (https://bioconda.github.io). To populate ASaiM with the selected microbiota tools, we migrated then 12 tools/suites of tools and their dependencies to Bioconda (e.g. ...
Background
New generations of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies.
Findings
We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore and visualize microbiota information from raw metataxonomic, metagenomic or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets, but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io)
Conclusions
Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible and shareable.
... management platform (Fig. 2). When available we used BioConda (Grüning et al., 2017) to install the open source software. Summary statistics of the original input and subsequent downstream results files were collected at each step of the pipeline. ...
Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and Application Programming Interface, focused on Staphylococcus aureus , a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections.
Written in Python, Staphopia’s analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100x chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (Type strain, N315).
We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public ENA database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top ten STs made up 70% of all genomes. MRSA (methicillin resistant S. aureus ) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus sequence type, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases.
... The RNA structure and RNA-RNA interaction prediction approaches that have been discussed here are typically used via either according web servers or commandline interfaces of local installations. To enhance reproducibility and to accommodate for large-scale application, pipeline and workflow systems like Galaxy (110,111) and bioconda (112) have been developed. Recently, the "RNA workbench" extension of Galaxy was published (113), which features many of the approaches outlined here. ...
Structure and Interaction Prediction in Prokaryotic RNA Biology, Page 1 of 2
Abstract
Many years of research in RNA biology have soundly established the importance of RNA-based regulation far beyond most early traditional presumptions. Importantly, the advances in “wet” laboratory techniques have produced unprecedented amounts of data that require efficient and precise computational analysis schemes and algorithms. Hence, many in silico methods that attempt topological and functional classification of novel putative RNA-based regulators are available. In this review, we technically outline thermodynamics-based standard RNA secondary structure and RNA-RNA interaction prediction approaches that have proven valuable to the RNA research community in the past and present. For these, we highlight their usability with a special focus on prokaryotic organisms and also briefly mention recent advances in whole-genome interactomics and how this may influence the field of predictive RNA research.