May 2024
·
253 Reads
·
79 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
May 2024
·
253 Reads
·
79 Citations
February 2023
·
52 Reads
·
13 Citations
Genome Research
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.
April 2022
·
526 Reads
·
745 Citations
Nucleic Acids Research
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
March 2022
·
74 Reads
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For over a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. In order to streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing and executing Galaxy tools, workflows and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers. Planemo is a mature project widely used within the Galaxy community which has been downloaded over 80,000 times.
January 2022
·
260 Reads
·
123 Citations
Cell Genomics
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.
August 2021
·
64 Reads
·
140 Citations
Communications of the ACM
8 pages, 3 figures. For the LaTex source code of this paper, see https://github.com/mr-c/cwl_methods_included
May 2021
·
157 Reads
·
2 Citations
A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard.
October 2020
·
6 Reads
·
1 Citation
Bioinformatics
Motivation The existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion. Availability GalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.
August 2020
·
129 Reads
·
29 Citations
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.
May 2020
·
36 Reads
·
1 Citation
The existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion. Availability and implementation GalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner . The documentation is available at http://gcr.cloudve.org/ . Contact Enis Afgan ( enis.afgan@jhu.edu ) Supplementary information None
... All the analyses were conducted on the European Galaxy server (https://usegalaxy.eu; acessed on 6 October 2023) [37]. ...
May 2024
... The connection of WorkflowHub to the LifeMonitor service 41 , through the LifeMonitor GitHub app, allows workflow function and status to be reported to maintainers and users through regular automated tests driven by continuous integration (CI) based monitoring (e.g. Planemo automated workflow testing using Galaxy [50]). In these cases, WorkflowHub will also include a badge that shows if the tests are passing or failing. ...
February 2023
Genome Research
... This literature thus presents a contrast between discussions that promote awareness, adoption, and implementation of established standards, versus authors presenting and promoting a new standardization mechanism that may not have been used or implemented anywhere except by the authors (Crusoe et al., 2022). Both can lead to standardization downstream, depending on many factors, as discussed further in the next section. ...
August 2021
Communications of the ACM
... Differentially expressed genes between NAS 0-3 and NAS 4-6 (all fibrosis stage ≤ 1) of the cohort 1 datasets were identified with edgeR (v3.36.0 + galaxy 5) [47] on the Galaxy Australia Bioinformatics Platform (https://usegalaxy.org.au/) accessed on 22 May 2025 [48]. Genes with very low expression (counts per million < 1 in a minimum of 3 samples) were excluded. ...
April 2022
Nucleic Acids Research
... Ongoing initiatives aim to enhance the accessibility of these bioinformatic software tools while also promoting the reproducibility of genomic analyses. Galaxy, KBase, AnVIL, Anvi'o, and QIIME2 are excellent examples of web-based tools, computing environments, or software ecosystems that are actively maintained by a large community of scientists [7][8][9][10][11]. Many instances of Galaxy are freely available worldwide, providing easy access to thousands of specialized bioinformatics tools, regardless of the user's level of computer training. ...
January 2022
Cell Genomics
... As a result, container technologies like Docker and Singularity are becoming increasingly used within the community as tools to quickly and reliably deploy bioinformatics software 22,23 . In addition, tool or workflow definition standards and workflow engines are becoming more widely used within many pipeline and software stacks [24][25][26][27][28] . As such, we have developed an implementation of the eCLIP bioinformatics pipeline that leverages these technologies and standards to improve portability and reproducibility of our eCLIP data analysis methods. ...
May 2021
... Changes will preserve across successive boots for non-volatile storage mediums such as USB sticks, ideal in deployment scenarios with infrequent or absent internet access. The annotation components will additionally be merged into the Bioconda [7] bioinformatic software distribution for the benefit of the wider bioinformatic community. ...
October 2017
... Next, we further evaluated these assemblies by looking at their 'informational' content. We detected differences between assemblies by estimating the full-length transcript 'coverage' of the different assembled transcripts, or as we prefer to call them, transfragments, when compared to the Uniprot_Sprot protein database with Blastx [14,15]. We selected Uniprot_Sprot because this is a high quality database [16][17][18][19]. ...
May 2015
... Raw metabarcoding sequence data was analyzed using the eDNA Flow pipeline (Mousavi-Derazmahalleh et al. 2021), where data were demultiplexed and trimmed. Demultiplexed sequences were then processed in the Galaxy Europe platform (Batut et al. 2017) using the Mothur package (v 1.33; Schloss et al. 2009). Briefly sequences having at least 110 bp in length with no ambiguous bases, and no more than 12 homopolymers were retained. ...
April 2018
... In the past few years several toolboxes have been released in an effort to address such challenges with using Galaxy [14][15][16][17][18][19]. Yet, these toolkits are often designed to analyse only one specific dimension of transcriptome diversity, and/or not fully automated and require some prior knowledge of R command line script [20]. ...
September 2016