Fig 2 - uploaded by Ling-Hong Hung
Content may be subject to copyright.
A comparison of the architecture of virtual machines and Docker software containers. Virtual machines are denoted by cyan boxes and software containers are denoted by green boxes. The left stack is a Type-2 virtual machine (VM) which uses a hypervisor to emulate the guest OS. The application software, dependences, and the guest OS are all contained inside the VM. A separate VM, dependencies and guest OS are required for each application stack that is to be deployed. The middle stack depicts Docker container software on a Linux host. Docker uses the host Linux system and packages the application and dependencies into modular containers. No VM is necessary and the OS resources for the two application stacks are shared between different containers. The right stack depicts Docker on a non-Linux system. Because Docker requires Linux, a lightweight VM with a mini-Linux Guest OS is necessary to run Docker and encapsulate the software containers. This still has the advantage that only a single VM and Guest Linux system is required regardless of the number of containers.
Source publication
Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and oper...
Context in source publication
Similar publications
Yibo Gao Na Sun Lieju Wang- [...]
Min Yan
Neuropathic pain (NP) is a type of chronic pain that is different from the common type of pain. The mechanisms of NP are still poorly understood. Exploring the key genes and neurobiological changes in NP could provide important diagnostic and treatment tools for clinicians. GSE24982 is an mRNA-seq dataset that we downloaded from the Gene Expression...
Aim
In this study the significant differentially expressed genes (DEGs) related to gastric cancer (GC) and chronic gastritis were screened to introduce common and distinctive genes between the two diseases.
Background
Diagnosis of gastric cancer as a mortal disease and chronic gastritis the stomach disorder which can be considered as risk factor o...
Sorghum bicolor is the fifth most important cereal crop in the world after rice, wheat, barley and maize and is grown worldwide in the semi-arid and arid regions. Functional identification of proteins and a detailed study of protein-protein interactions in Sorghum are very essential to understand the biological process underlying the various traits...
Background:
Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task.
Objectives:...
In this paper we present new data export modules for Cytoscape 3 that can generate network files for Cytoscape.js and D3.js. Cytoscape.js exporter is implemented as a core feature of Cytoscape 3, and D3.js exporter is available as a Cytoscape 3 app. These modules enable users to seamlessly export network and table data sets generated in Cytoscape t...
Citations
... Sistem OJS dibangun dengan bahasa pemrograman PHP dan Basis data MySQL. Mesin virtual pada web container akan menjadi pusat container yang akan melakukan semua manajemen pada sistem Docker dan app container berisi PHP dan MySQL yang akan menjadi worker (Hung et al., 2016). ...
Introduction: The growing use of data centers encourages virtualization technology to become an alternative solution in virtualization to provide a dense environment that can be adjusted according to needs. In hypervisor-based virtualization technology in managing servers in the Open Journal System data center, network administrators must allocate resources that are large enough so that when developing web or mobile systems takes a long time, and in hypervisor-based virtualization techniques, they must have access to the host kernel. Purpose: Research on the attack model and vulnerabilities of the Docker internal security system on the Open Journal System from the Docker daemon attack and DDoS attacks when building and managing the Docker internal system. Method: This research begins with a literature study, network scope, system design, and system implementation based on plans that have been made, as well as testing, analysis, and concluding the tests that have been carried out. Finding: At the testing stage, the results obtained were the success of the Docker system in handling DDoS attacks and the success of securing the Docker Daemon from Docker Daemon attacks. Conclusion: The Docker Daemon Attack can occur by misconfiguring containers. This flaw allows an unauthorised party to take control of a container that has already been created. By gaining root access, attackers can perform various malicious activities within the container. Therefore, it is important to have an understanding and implementation of proper security practices in the management and configuration of Docker containers to reduce the risk of these types of attacks. Keyword: Docker, Container, DDoS, Attack. ------------------------------------------ Studi dan Analisis Keamanan Sistem Internal Docker dari Docker Daemon Attack dan Ddos Pada Sistem Open Journal System Pendahuluan: Berkembangnya penggunaan data center ini mendorong teknologi virtualisasi menjadi salah satu alternatif solusi dalam virtualisasi untuk menyediakan lingkungan yang padat agar dapat disesuaikan sesuai kebutuhan. Pada teknologi virtualisasi berbasis hypervisor dalam pengelolaan server pada data Open Journal System, center administrator jaringan harus mengalokasikan resources yang cukup besar, sehingga ketika dilakukan development sistem web atau mobile membutuhkan waktu yang lama, serta pada teknik virtualisasi berbasis hypervisor harus memiliki akses ke kernel host. Tujuan: Meneliti model serangan dan kerentanan sistem keamanan internal Docker pada Open Journal System dari ancaman Docker Daemon Attack dan serangan DDoS pada saat membangun dan mengelola sistem internal Docker. Metode Penelitian: Penelitian ini dimulai dengan studi literatur, ruang lingkup jaringan, perancangan sistem, implementasi sistem berdasarkan rancangan yang telah dibuat, serta pengujian, analisis dan penarikan kesimpulan dari pengujian yang telah dilakukan. Hasil Penelitian: Pada tahap pengujian dilakukan hasil yang didapatkan yaitu keberhasilan sistem docker dalam menangani serangan DdoS serta keberhasilan pengamanan docker daemon dari serangan docker daemon attack. Kesimpulan: Serangan Docker Daemon dapat terjadi karena kesalahan konfigurasi container. Kelemahan ini memungkinkan pihak yang tidak berwenang untuk mengambil kendali atas container yang telah dibuat. Dengan mendapatkan akses root, penyerang dapat melakukan berbagai aktivitas berbahaya di dalam container. Oleh karena itu, penting untuk memiliki pemahaman dan penerapan praktik keamanan yang tepat dalam pengelolaan dan konfigurasi container Docker untuk mengurangi risiko jenis serangan ini. Kata Kunci: Docker, Container, DDoS, Attack.
... The major technical challenge to porting desktop based image analyses applications to the cloud is to support the same graphical interface and display on the cloud that one would see on a laptop or desktop. Bwb supports two methodologies for accomplishing this using software containers 20,21 . We combine both methods in Bwb to allow the user to export graphics from a container that functions both on a local laptop and on a remote cloud server. ...
Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud. However, most image analyses contain steps that are graphical and interactive, features that are not supported by most omics execution engines. We present the containerized and cloud-enabled Biodepot-workflow-builder platform that supports graphics from software containers and has been extended for image analyses. We demonstrate the potential of our modular approach with multi-step workflows that incorporate the popular and open-source Fiji suite for image processing. One of our examples integrates fully interactive ImageJ macros with Jupyter notebooks. Our second example illustrates how the complicated cloud setup of an computationally intensive process such as stitching 3D digital pathology datasets using BigStitcher can be automated and simplified. In both examples, users can leverage a form-based graphical interface to execute multi-step workflows with a single click, using the provided sample data and preset input parameters. Alternatively, users can interactively modify the image processing steps in the workflow, apply the workflows to their own data, change the input parameters and macros. By providing interactive graphics support to software containers, our modular platform supports reproducible image analysis workflows, simplified access to cloud resources for analysis of large datasets, and integration across different applications such as Jupyter.
... Though raw metabolomics data can be uploaded and accessed through online databases such as MetaboLights [24] or metabolomics workbench [25], details of data analysis are not always transparent, and reduce the ability to fully reproduce the reported findings [26]. Data analysis software with a graphic user interface (GUI) can be easy to use and document, but is also restricted to only defined operations [27]. An open source data processing script can represent every step of the data analysis while still being flexible [28], but researchers need to adopt specific software within an integrated development environment (IDE), which also reduces reproducibility due to the lack of experience with certain software [29]. ...
... Therefore, we used SRM samples that are commercially available and commonly used in metabolomics workflows, and made the raw data accessible online for future potential research purposes. In order to provide full transparency on the data analysis, we choose a command line based script within a graphic user interface to make sure every step is recorded and reproducible by other researchers [27]. A docker image, xcmsrocker was created based on Rocker image [32], which pre-installs most of the R-based metabolomics and NTA data analysis software. ...
Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected for fragmentation is commonly performed using data dependent acquisition (DDA) strategies or following statistical analysis using targeted MS/MS approaches. However, the selected precursor ions from DDA only cover a biased subset of the peaks or features found in full scan data. In addition, different statistical analysis can select different precursor ions for MS/MS analysis, which make the post-hoc validation of ions selected following a secondary analysis impossible for precursor ions selected by the original statistical method. Here we propose an automated, exhaustive, statistical model-free workflow: paired mass distance-dependent analysis (PMDDA), for reproducible untargeted mass spectrometry MS2 fragment ion collection of unknown compounds found in MS1 full scan. Our workflow first removes redundant peaks from MS1 data and then exports a list of precursor ions for pseudo-targeted MS/MS analysis on independent peaks. This workflow provides comprehensive coverage of MS2 collection on unknown compounds found in full scan analysis using a “one peak for one compound” workflow without a priori redundant peak information. We compared pseudo-spectra formation and the number of MS2 spectra linked to MS1 data using the PMDDA workflow to that obtained using CAMERA and RAMclustR algorithms. More annotated compounds, molecular networks, and unique MS/MS spectra were found using PMDDA compared with CAMERA and RAMClustR. In addition, PMDDA can generate a preferred ion list for iterative DDA to enhance coverage of compounds when instruments support such functions. Finally, compounds with signals in both positive and negative modes can be identified by the PMDDA workflow, to further reduce redundancies. The whole workflow is fully reproducible as a docker image xcmsrocker with both the original data and the data processing template.
Graphical Abstract
... Containerized software or code can be run with dependencies installed within the container, which is isolated from packages or dependencies already installed in the host system. Nowadays, both console-based software and software with graphical user interface (GUI) can be containerized [122,123], and the software container supports both Linuxand Windows-based applications [124]. Some commonly used software containerization tools are Docker and Singularity [125,126], but Singularity has better support towards high-performance computing [127]. ...
Clinical metabolomics emerged as a novel approach for biomarker discovery with the translational potential to guide next-generation therapeutics and precision health interventions. However, reproducibility in clinical research employing metabolomics data is challenging. Checklists are a helpful tool for promoting reproducible research. Existing checklists that promote reproducible metabolomics research primarily focused on metadata and may not be sufficient to ensure reproducible metabolomics data processing. This paper provides a checklist including actions that need to be taken by researchers to make computational steps reproducible for clinical metabolomics studies. We developed an eight-item checklist that includes criteria related to reusable data sharing and reproducible computational workflow development. We also provided recommended tools and resources to complete each item, as well as a GitHub project template to guide the process. The checklist is concise and easy to follow. Studies that follow this checklist and use recommended resources may facilitate other researchers to reproduce metabolomics results easily and efficiently.
... The package is distributed as open-source software 2 under a GPL3 licence. It is available for Linux and MacOS systems, as well as a Docker image [18], which can be deployed on Windows. It can be used as a Python library and being integrated in third-party applications, or used directly from the command line and called from bash scripts. ...
... The package is distributed as open-source software 2 under a GPL3 licence. It is available for Linux and MacOS systems, as well as a Docker image [18], which can be deployed on Windows. It can be used as a Python library and being integrated in third-party applications, or used directly from the command line and called from bash scripts. ...
We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.
... The major technical challenge to porting desktop based image analyses applications to the cloud is to support the same graphical interface and display on the cloud that one would see on a laptop or desktop. Bwb supports two methodologies for accomplishing this using software containers [6,7]. We combine both methods in Bwb to allow the user to export graphics from a container that functions both on a local laptop and on a remote cloud server. ...
Biomedical image analyses can require many steps processing different types of data. Analysis of increasingly large data sets often exceeds the capacity of local computational resources. We present an easy-to-use and modular cloud platform that allows biomedical researchers to reproducibly execute and share complex analytical workflows to process large image datasets. The workflows and the platform are encapsulated in software containers to ensure reproducibility and facilitate installation of even the most complicated workflows. The platform is both graphical and interactive allowing users to use the viewer of their choice to adjust the image pre-processing and analysis steps to iteratively improve the final results. We demonstrate the utility of our platform via two use cases in focal adhesion and 3D imaging analyses. In particular, our focal adhesion workflow demonstrates integration of Fiji with Jupyter Notebooks. Our 3D imaging use case applies Fiji/BigStitcher to big datasets on the cloud. The accessibility and modularity of the cloud platform democratizes the application and development of complex image analysis workflows.
... Though raw metabolomics data can be uploaded and accessed through online databases such as MetaboLights (Haug et al., 2020) or Metabolomics Workbenchs (https://www.metabolomicsworkbench.org/), details of data analysis are not as transparent as data sharing, and reduce the ability to fully reproduce the reported findings (Goodman et al., 2016). Data analysis software with a graphic user interface (GUI) can be easy to use and document, but is also restricted to only defined operations (Hung et al., 2016). An open source data process script can represent every step of the data analysis while still being flexible, (Gandrud, 2013) but researchers need to adopt specific software within an integrated development environment (IDE), which also reduces reproducibility due to the lack of experience with certain software (Boettiger, 2015). ...
... Therefore, we used SRM samples that are commercially available and commonly used in metabolomics workflows, and made the raw data accessible online for future potential research purposes. In order to provide full transparency on the data analysis, we choose a command line based script within a graphic user interface to make sure every step is recorded and reproducible by other researchers (Hung et al., 2016). A docker image, xcmsrocker was created based on Rocker image (Boettiger and Eddelbuettel, 2017), which pre-installs most of the R-based metabolomics and NTA data analysis software. ...
Motivation Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected for fragmentation is commonly performed using data dependent acquisition (DDA) strategies or following statistical analysis using targeted MS/MS approaches. However, the selected precursor ions from DDA only cover a biased subset of the peaks or features found in full scan data. In addition, different statistical analysis can select different precursor ions for MS/MS analysis, which make the post-hoc validation of ions selected by new statistical methods impossible for precursor ions selected by the original statistical method. By removing redundant peaks and performing pseudo-targeted MS/MS analysis on independent peaks, we can comprehensively cover unknown compounds found in full scan analysis using a “one peak for one compound” workflow without a priori redundant peak information. Here we propose an reproducible, automated, exhaustive, statistical model-free workflow: paired mass distance-dependent analysis (PMDDA), for untargeted mass spectrometry identification of unknown compounds found in MS1 full scan. Results More annotated compounds/molecular networks/spectrum were found using PMDDA compared with CAMERA and RAMClustR. Meanwhile, PMDDA can generate the preferred ions list for iterative DDA to cover more compounds when instruments support such functions. Availability and implementation The whole workflow is fully reproducible as a docker image xcmsrocker with both the original data and the data processing template. https://hub.docker.com/r/yufree/xcmsrocker A related R package is developed and released online: https://github.com/yufree/rmwf. R script, data files and links of GNPS annotation results including MS1 peaks list and MS2 MGF files were provided in supplementary information.
... 13 Containers have seen an increased uptake in the life sciences, both for delivering software tools and for facilitating data analysis in various ways. [14][15][16][17][18] When running more than just a few containers, an orchestration system is needed to coordinate and manage their execution and handle issues related to e.g. load balancing, health checks and scaling. ...
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible analyses. In this article, we review a number of approaches to using containers as implemented in the workflow tools Nextflow, Galaxy, Pachyderm, Argo, Kubeflow, Luigi and SciPipe, when deployed in cloud environments. A particular focus is placed on the workflow tool’s interaction with the Kubernetes container orchestration framework.
... In the past few years several toolboxes have been released in an effort to address such challenges with using Galaxy [14][15][16][17][18][19]. Yet, these toolkits are often designed to analyse only one specific dimension of transcriptome diversity, and/or not fully automated and require some prior knowledge of R command line script [20]. ...
Background
As the number of RNA-seq datasets that become available to explore transcriptome diversity increases, so does the need for easy-to-use comprehensive computational workflows. Many available tools facilitate analyses of one of the two major mechanisms of transcriptome diversity, namely, differential expression of isoforms due to alternative splicing, while the second major mechanism—RNA editing due to post-transcriptional changes of individual nucleotides—remains under-appreciated. Both these mechanisms play an essential role in physiological and diseases processes, including cancer and neurological disorders. However, elucidation of RNA editing events at transcriptome-wide level requires increasingly complex computational tools, in turn resulting in a steep entrance barrier for labs who are interested in high-throughput variant calling applications on a large scale but lack the manpower and/or computational expertise.
Results
Here we present an easy-to-use, fully automated, computational pipeline (Automated Isoform Diversity Detector, AIDD) that contains open source tools for various tasks needed to map transcriptome diversity, including RNA editing events. To facilitate reproducibility and avoid system dependencies, the pipeline is contained within a pre-configured VirtualBox environment. The analytical tasks and format conversions are accomplished via a set of automated scripts that enable the user to go from a set of raw data, such as fastq files, to publication-ready results and figures in one step. A publicly available dataset of Zika virus-infected neural progenitor cells is used to illustrate AIDD’s capabilities.
Conclusions
AIDD pipeline offers a user-friendly interface for comprehensive and reproducible RNA-seq analyses. Among unique features of AIDD are its ability to infer RNA editing patterns, including ADAR editing, and inclusion of Guttman scale patterns for time series analysis of such editing landscapes. AIDD-based results show importance of diversity of ADAR isoforms, key RNA editing enzymes linked with the innate immune system and viral infections. These findings offer insights into the potential role of ADAR editing dysregulation in the disease mechanisms, including those of congenital Zika syndrome. Because of its automated all-inclusive features, AIDD pipeline enables even a novice user to easily explore common mechanisms of transcriptome diversity, including RNA editing landscapes.