Simon Gladman’s research while affiliated with University of Melbourne and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Statistics about training events organized by the Galaxy community and visits of the GTN website over years
(A) Number of events supported GTN materials, registered on the Galaxy Community Hub website [22], per year between 2018 and 2021. (B) Number of TIaaS events per year on Galaxy Europe (collected in May 2022). (C) Number of participants to TIaaS events per year on UseGalaxy.* (collected in May 2022). (D) Number of visits per year on the GTN website and per topics, initially tracked by Google Analytics and later with Plausible (collected in May 2022). The latest usage statistics are publicly available from https://plausible.galaxyproject.eu/training.galaxyproject.org.
Content of material available on the GTN and feedback from learners
(A) Evolution of number of topics, tutorials, and contributors over the months between 2017 and 2022. (B) Number and type of tutorials per topics available on the GTN on April 2022. The latest statistics are publicly available from https://training.galaxyproject.org/stats. (C) Type of supporting materials for tutorials per topics available on the GTN on April 2022. (D) Score of the embedded feedback in the tutorials per topics. Three questions are asked in the form: “How much did you like this tutorial?” (from 1 (bad) to 5 (great)), “What did you like?”, “What could be improved?”. The latest feedback results are publicly available from https://training.galaxyproject.org/feedback.
Structure of a GTN tutorial and its features
(A) Screenshots for parts of a GTN tutorial to show its structure. Title and authors are listed, followed by an overview box containing metadata about the tutorial (target audience, learning objectives, prerequisites, supporting materials, time estimate, etc). The tutorial content itself is a mix of theory (text) and so-called “hands-on” boxes describing practical steps to be performed in Galaxy. Question and answer boxes may be added at any point in the tutorial to allow students to self-evaluate as they progress. The end of a tutorial provides a box with take-home messages, references, and suggestions for further reading and follow-up tutorials. Additionally, a feedback form is embedded at the end of every tutorial. Access to FAQs, support channels, page view statistics, and translated versions of the materials are provided via the top menu of the webpage. (B) Features of a GTN tutorial grouped in 4 categories: rich metadata, structured content, strong (technical) support, and community oriented.
Typical process to create a new tutorial
Authors usually start by identifying suitable input datasets and developing the workflow in Galaxy. Thus, a workflow can be automatically converted into a tutorial skeleton using the PTDK described in the next section. The tutorial is then tested, reviewed, and finally merged into the GTN.
Implementation of the “Ten simple rules for making training materials FAIR” [16] in the GTN

+1

Galaxy Training: A powerful framework for teaching!
  • Article
  • Full-text available

January 2023

·

238 Reads

·

60 Citations

·

Helena Rasche

·

Simon Gladman

·

[...]

·

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.

Download

Figure 3: Map of countries targeted by TIaaS events . T his combines 2 datasets: the statistics provided by the Application Programming Interfaces (APIs) of the 4 discussed TIaaS servers and a set of corrections from course registration data for the Smörgåsbord event series . T his correction is needed as the authors did not sufficiently fill out the TIaaS form when they requested resources for the Smörgåsbord event, choosing to specify only a single country, which would otherwise result in potential undercounting of countries actually targeted by TIaaS managed events.
Figure 4: Since its introduction, it has grown into a well-used service over the past 4 years . T here ha ve been 438 training events, primarily hosted by the Australian and European servers, which are both very involved in training. Event length distribution in days is extremely heavily skewed to very short e v ents, with a long tail of semester-long courses using the platform. Ev ent sizes show a similar distribution; most classes ar e small, while 7 extr emel y lar ge courses ( > 500 participants) wer e filter ed fr om this gr a ph as outliers . T hese courses ar e mor e lik e Massi ve Open Online Courses (MOOCs) than traditional in person courses.
Training Infrastructure as a Service

December 2022

·

131 Reads

·

10 Citations

GigaScience

Background: Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. Findings: Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. Conclusions: TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training.


Fig 4. Typical process to create a new tutorial. Authors usually start by identifying suitable input datasets, and developing the workflow in Galaxy. Thus, a workflow can be automatically converted into a tutorial skeleton using the PTDK described in the next section. The tutorial is then tested, reviewed and finally merged into the GTN.
Galaxy Training: A Powerful Framework for Teaching!

June 2022

·

119 Reads

·

1 Citation

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis and stewardship are still rarely taught in life science educational programs [1], resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform ( https://training.galaxyproject.org ); an open access, community-driven framework for the collection of FAIR training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform [2]. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.


Expanding the Galaxy’s reference data

April 2022

·

69 Reads

·

2 Citations

Bioinformatics Advances

Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. Additionally, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie’s remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS repository from GalaxyProject.org, with mirrors across the United States, Canada, Europe, and Australia, enabling easy use outside of Galaxy. Availability and implementation The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license.


Figure 1. Usage of the usegalaxy servers in Australia (AU), Europe Union (EU) and the United States (US). Large compute infrastructure is available to anyone, for free, without any configuration and it spans the world (more below). User acquisition, user retention, and user activity are captured. A dip in usage captured at the right hand side of some diagrams is cyclical, due to the end of the calendar year. A significant increase in the number of monthly jobs in the EU is due to the start of analyzing SARS-CoV-2 data (more below).
Figure 2. Categorization of the type of tools executed by users across the three most popular usegalaxy servers.
Figure 3. A sample workflow report, showing tSNE and UMAP plots of single cell expression data, automatically generated and formatted based on the outputs of a workflow.
Figure 4. (A) The Galaxy-ML toolkit provides all the tools necessary to define a learner, train it, evaluate it, and visualize its performance. (B) A Galaxy workflow to create a learner using a pipeline, perform hyperparameter search and visualize the results.
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update

April 2022

·

509 Reads

·

718 Citations

Nucleic Acids Research

Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.



Eight low frequency allelic-variants co-occurred in 8 samples.
Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

March 2021

·

65 Reads

·

10 Citations

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data. Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org.


Expanding the Galaxy’s reference data

October 2020

·

213 Reads

·

1 Citation

Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows to easily organize, retrieve, and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. Additionally, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie’s remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS repository from GalaxyProject.org , with mirrors across the United States, Canada, Europe, and Australia, enabling easy use outside of Galaxy. Availability and implementation The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 20.05, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/galaxyproject/tools-iuc/data_managers/data_manager_refgenie_pull and released using an MIT license.


Distribution of nucleotide changes across SARS-CoV-2 genome
AF, minor allele frequency; POS, position; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Amino acid alignment of spike glycoprotein regions HR1 (A) and HR2 (B). The site of the Lys⁹²¹Gln substitution observed by us in a SARS-CoV-2 isolate is highlighted with a black rectangle in panel A. Its corresponding salt bridge partner is highlighted with a black rectangle in panel B. SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Location of potential recombination breakpoints along the S gene (GARD analysis)
Analysis of branch-specific positive diversifying selection (aBSREL) along the branch leading to SARS-CoV-2 (MN988688)
SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Methods used for the analysis of primary SARS-CoV-2 data
No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics

August 2020

·

123 Reads

·

28 Citations

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.


Community-Driven Data Analysis Training for Biology

June 2018

·

181 Reads

·

191 Citations

Cell Systems

The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.


Citations (14)


... Technology Knowledge (TK) construct (Humaera et al., 2023) Professional Role (X1) The teacher negotiates his/her knowledge with colleagues and cooperates with colleagues in academic's matters (Shibing, 2024) The teacher respects and observes various cultural norms and customs. The teacher also adjusts the cultural differences and mismatches (Ceruelos & Ledezma, 2022) Institutional Support (X2) General Perceived Ease of Use of e-Learning Technology (Makhaya & Ogange, 2019) Institutional Support Factors for e-Learning Adoption (Syahdan et al., 2022) Technical and Training support for Students and Staff for e-Learning Adoption (Rustandi et al., 2024) Training and Access to Technical Support (Rasche et al., 2023) The research instrument consists of three main parts that reflect the research variables, namely teachers' readiness in educational technology (Y), teachers' professional roles (X1), and institutional support (X2). The indicators used were adapted from related literature to measure each variable. ...

Reference:

English Teachers' Readiness to Adopt Educational Technology: Professional Roles and Institutional Support
Training Infrastructure as a Service

GigaScience

... These technologies generate vast amounts of data that require multi-disciplinary teams and sophisticated computational methods for analysis and interpretation [1]. To meet the increasing demand for well-trained data scientists in biology, substantial efforts are being directed toward establishing and disseminating pedagogical best practices [2][3][4]. ...

Galaxy Training: A powerful framework for teaching!

... These are deployed within the Galaxy bioinformatics ecosystem [16], providing a workbench wherein scientists can share, analyze, and visualize their results in a reproducible manner, carrying out analyses on scalable computer resources accessed through a web browser interface. The ecosystem also provides extensive online and on-demand training material via the Galaxy Training Network [17], including guidance on the usage of the platform for SARS-CoV-2 studies [18]. ...

Galaxy Training: A Powerful Framework for Teaching!

... Galaxy [74] is an open source web platform for FAIR data analysis and workflow management. It was developed in 2005 to support genome analysis within the bioinformatics community. ...

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update

Nucleic Acids Research

... While Galaxy already offers robust support for SARS-Cov-2 55 and MPOX analyses, expanding its capabilities to better handle viruses, archaea, and eukaryotes will be crucial in addressing gaps in microbial research. Indeed, despite their important roles in microbial communities, non-bacterial entities are often underrepresented in many ecological studies due to the domination of prokaryotic signals 56 . ...

Ready-to-use public infrastructure for global SARS-CoV-2 monitoring
  • Citing Article
  • September 2021

Nature Biotechnology

... Mapping of reads and variant calling were conducted on the Galaxy server by running the workflow "SARS-CoV-2 Illumina Amplicon pipeline -iVar based". 10 Obtained consensus sequences were compared to reference by running COVID-Align. 11 Amino acid changes variations were identified at protein level by running Nextclade. ...

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

... Raw metabarcoding sequence data was analyzed using the eDNA Flow pipeline (Mousavi-Derazmahalleh et al. 2021), where data were demultiplexed and trimmed. Demultiplexed sequences were then processed in the Galaxy Europe platform (Batut et al. 2017) using the Mothur package (v 1.33; Schloss et al. 2009). Briefly sequences having at least 110 bp in length with no ambiguous bases, and no more than 12 homopolymers were retained. ...

Community-driven data analysis training for biology

... can be understood, and also a beneficial use of the genetic information when designing countermeasures to an infectious disease [5,6]. However, because this data can be misused, there may be hesitance to share genetic data that has been collected [7], or perhaps even withhold publishing to avoid pressure to share the data. ...

No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics

... + galaxy 1, accessed on 25 May 2022, [114]) was used to count the reads of the annotated GENCODE genes, version 33, hg38 [114]. The resulting reads were converted to transcripts per kilobase million (TPM) values [115]. The DeSeq2 tool (Galaxy Version 2.11.40.8 + galaxy0, accessed 13 February 2024 [116]) was used to identify differentially expressed genes (DEG) in pairwise comparison. ...

Community-Driven Data Analysis Training for Biology
  • Citing Article
  • June 2018

Cell Systems

... Data sources use heterogeneous file formats depending on the research domain and the information to be represented, among others 1 [100,58]. To process user-configured or data source module input files, as many commonly used file formats as possible need to be supported to ensure flexibility and extensibility. ...

Best practice data life cycle approaches for the life sciences