Conference PaperPDF Available

Orchestrating Docker Containers in the HPC Environment

Authors:

Abstract and Figures

Linux container technology has more than proved itself useful in cloud computing as a lightweight alternative to virtualisation, whilst still offering good enough resource isolation. Docker is emerging as a popular runtime for managing Linux containers, providing both management tools and a simple file format. Research into the performance of containers compared to traditional Virtual Machines and bare metal shows that containers can achieve near native speeds in processing, memory and network throughput. A technology born in the cloud, it is making inroads into scientific computing both as a format for sharing experimental applications and as a paradigm for cloud based execution. However, it has unexplored uses in traditional cluster and grid computing. It provides a run time environment in which there is an opportunity for typical cluster and parallel applications to execute at native speeds, whilst being bundled with their own specific (or legacy) library versions and support software. This offers a solution to the Achilles heel of cluster and grid computing that requires the user to hold intimate knowledge of the local software infrastructure. Using Docker brings us a step closer to more effective job and resource management within the cluster by providing both a common definition format and a repeatable execution environment. In this paper we present the results of our work in deploying Docker containers in the cluster environment and an evaluation of its suitability as a runtime for high performance parallel execution. Our findings suggest that containers can be used to tailor the run time environment for an MPI application without compromising performance, and would provide better Quality of Service for users of scientific computing.
Content may be subject to copyright.
Orchestrating Docker Containers in the HPC
Environment
Joshua Higgins, Violeta Holmes, Colin Venters
The University of Huddersfield,
Queensgate, Huddersfield, UK
{joshua.higgins, v.holmes, c.venters}@hud.ac.uk
Abstract. Linux container technology has more than proved itself use-
ful in cloud computing as a lightweight alternative to virtualisation,
whilst still offering good enough resource isolation. Docker is emerging as
a popular runtime for managing Linux containers, providing both man-
agement tools and a simple file format. Research into the performance
of containers compared to traditional Virtual Machines and bare metal
shows that containers can achieve near native speeds in processing, mem-
ory and network throughput. A technology born in the cloud, it is making
inroads into scientific computing both as a format for sharing experimen-
tal applications and as a paradigm for cloud based execution.However,
it has unexplored uses in traditional cluster and grid computing. It pro-
vides a run time environment in which there is an opportunity for typical
cluster and parallel applications to execute at native speeds, whilst being
bundled with their own specific (or legacy) library versions and support
software. This offers a solution to the Achilles heel of cluster and grid
computing that requires the user to hold intimate knowledge of the local
software infrastructure. Using Docker brings us a step closer to more ef-
fective job and resource management within the cluster by providing both
a common definition format and a repeatable execution environment. In
this paper we present the results of our work in deploying Docker con-
tainers in the cluster environment and an evaluation of its suitability as
a runtime for high performance parallel execution. Our findings suggest
that containers can be used to tailor the run time environment for an
MPI application without compromising performance, and would provide
better Quality of Service for users of scientific computing.
Keywords: Linux containers, Docker, cluster, grids, run time environ-
ment
1 Introduction
Cloud computing has been driven by the Virtual Machine (VM). They are widely
deployed to achieve performance and resource isolation for workloads by con-
straining the amount of virtual memory and processor cores available to a guest
system. This allows resource sharing on a massive scale; VMs can be provisioned
with any software environment and kept separate from other guests on the same
physical server[18].
Linux container technology is classed as an operating system virtualisation
method. It allows the creation of separate userspace instances in which the same
kernel is shared.This provides functionality similar to a VM but with a lighter
footprint. The Docker project provides a management tool and its own library
for communicating with containment features in the OS kernel[2].
Resource isolation takes a back seat in HPC systems which generally execute
a user’s job within the same OS environment that runs directly on the hardware
to gain the best performance. This poses a problem for application portability,
especially in grid systems where a remote resource may lack the libraries or sup-
port software required by a job. This undermines efforts by middleware vendors
to unify resources and provide a common format for accessing heterogeneous
systems[8]. In this respect some features of the cloud are desirable in cluster
computing.
Docker containers offer an opportunity to create cloud-like flexibility in the
cluster without incurring the performance limitations of a VM. This paper in-
vestigates the performance gains that can be achieved using Docker containers
for executing parallel applications when compared to the KVM hypervisor[5] in
a typical scientific cluster computing environment. We also propose a method of
executing an MPI job encapsulated in a Docker container through the cluster
resource manager.
In the sections of this paper, a short review of Docker containers versus
Virtual Machines will be conducted. The current work will be then discussed.
Section 4 describes the proposed Docker in the HPC cluster solution. The results
from the deployment of this implementation will be evaluated and future work
identified.
2 Docker vs KVM
2.1 Architecture
KVM is a popular hypervisor for Linux that introduces virtualisation support
into the Linux kernel. The hypervisor provides an illusion to the guest OS that
it is managing it’s own hardware resources [15]. A request from the guest must
be translated into a request to the underlying physical hardware; a process in
modern hypervisors that is highly optimised and transparent to the guest. This
allows the hypervisor to host a VM without modifications to the OS that has
been chosen.
Docker containers do not require a layer of translation. On Linux, they are
implemented using ’cgroups’; a feature in the Linux kernel that allows the re-
sources (such as CPU, memory and network) consumed by a process to be con-
strained[14]. The processes can then be isolated from each other using kernel
’namespaces’[13]. This fundamental difference requires that the guest system
processes are executed by the host kernel, restricting containers on a Linux host
to only other Linux flavours. However, it means that an executable within the
container system essentially runs with no additional overhead compared to an
executable in the host OS. A container is not required to perform any system
initialisation - its process tree could contain just the program being run and any
other programs or services that it depends on.
2.2 Performance
A benchmark by IBM Research demonstrates that the performance of a Docker
container equals or exceeds KVM performance in CPU, memory and network
benchmarks [12]. The results show that both methods do not introduce a mea-
surable overhead for CPU and memory performance. However the Linpack per-
formance inside KVM is shown to be very poor; the hypervisor abstracts the
hardware and processor topology which does not allow tuning and optimisation
to take place.
They also suggest that the preferential scaling topology for Docker containers
is by processor socket. By not allowing a container to span cores distributed over
multiple processor sockets, it avoids expensive inter-processor bus communica-
tion. This is in line with the philosophy already ingrained in cluster computing
applications in which a process per core is executed on compute nodes. However,
the effect may not be appreciable in distributed memory applications where the
bandwidth of the network interconnect may be many orders of magnitude slower.
2.3 Storage
A Virtual Machine is traditionally accompanied by a disk image which holds the
OS and run time applications. To create or modify this disk image would require
the user to hold the knowledge of installing the OS, or systems management
experience. The resulting image file may span several gigabytes. This places a
large burden on storage and may be inconvenient to transfer between systems.
The Docker project introduces a concept of a ’Dockerfile’ which allows the
userspace of a container to be described as a list of directives in a text file that
construct the real image[3]. Each directive produces a layer of the final system
image, which are combined at run time using a copy-on-write method to appear
to the container as a single unified image. This allows multiple containers to
share common image layers, potentially reducing the amount of data required to
transfer between systems. The Dockerfile itself is significantly easier to customise
and can be easily version controlled or shared.
3 Current Work
The flexibility of the cloud allows one to create multiple clusters where each
individual virtual cluster can have it’s own customised software environment.
This draws parallels with the development of the now defunct OSCAR-V mid-
dleware[17] and Dynamic Virtual Clustering[11] concepts, which are able to pro-
vision a virtual cluster based on the requirements of a job at the time of submis-
sion. These systems still incur the overhead of running a VM as the job execution
environment and inherit the associated performance limitations.
The Agave API is a gateway platform that provides services typically found
in grid systems tailored for scientific users. It has a strong focus on web standards
and boasts support for running applications packaged in Docker containers in a
cloud[10]. However, Agave orchestrates a single container per job, which limits
the scope for running parallel applications[9].
HTCondor is a middleware for high-throughput computing that has support
for running jobs in parallel on dedicated machines, using ’universes’ to distin-
guish between different execution paradigms. HTCondor itself already supports
’cgroups’ to constrain the resources available to a job on Linux hosts and a
universe is under development to provide support for the Docker container for-
mat[16]. HTCondor has powerful resource discovery features but is the usefulness
of a container in not needing to know?
4 Docker in the HPC Cluster
To implement existing container and VM solutions in HPC systems requires
modifications to the software stack of the local HPC resources. The HPC sys-
tems would already have resource management and job scheduling in place. The
methodology should follow the concept of containers, that is to abstract the ap-
plication from the software stack of the resource. Modifying core components of
this stack to support containers introduces a portability problem. Any standard
resource manager provides all the information required to orchestrate a Docker
container within a resource.
A resource manager such as Torque uses a script that is the core of the job
execution and is responsible for configuring the environment and passing the
required information to a process starter, such as ’mpirun’. We use both the
MPICH[6] and OpenMPI[7] launchers regularly in our systems, which support
SSH to enable remote process execution.
4.1 Choosing a container model
Since we concluded that running a container has no appreciable overhead over
running a normal executable, we propose two different container models for
parallel execution. A single container could be started to mimic the worker node,
as shown in Figure 1, which would hold all the processes assigned to that node.
Secondly, a container per process could also be orchestrated as shown in
Figure 2. Whilst it is unlikely that processes within the same job would require
different run time environments, this presents an interesting opportunity for
resource usage accounting and can offer real time adjustment of the resource
constraints per process through ’cgroups’. It can also enforce a process to be
mapped to a specific processor core if this functionality is missing from the
process launcher.
Fig. 1. A container per node that holds all respective processes
Fig. 2. A container per process is orchestrated on each node
4.2 Container model implementation
The resource manager will not be aware of any containers, so the script that
is central to the job execution will be used to prepare the nodes and start the
containers, before the process launcher starts the parallel job. An overview of
this process is described in Figure 3. We cannot assume that the user is able to
SSH to the worker nodes to run the container preparation commands. In this
case the bootstrap script can also be invoked through the process launcher first
and then the process launcher is called a second time to invoke the parallel job.
The overview of the script process supports both container models.
When the process launcher is called, the environment has been modified in
two ways: the container will randomise it’s SSH port and expose it on the same
interface as the host. The SSH configuration is temporarily changed so that
the process launcher will use this port instead of the default, thereby launching
the process within the container and not on the node. However, if the process
launcher forks the process on the rank 0 node instead of using SSH, the process
will run outside the container. To avoid this condition, the script will write a
unique hostname alias for each node into the SSH configuration that maps to
the container on that node. These are substituted into the list of nodes provided
by the resource manager before being passed to the process launcher.
5 Deployment of container model on HPC cluster
Eridani is the general purpose, 136 core cluster within the QueensGate Grid at
the University of Huddersfield. Like the other clusters within this campus grid, it
Start PBS
script
Process
launcher
Stop
containers
on nodes
PBS script
finished
SSH to next
worker Is Docker
running? Pull image and
start container
Expose SSH
on random port
Is last
worker
node?
Create
container user
Yes
No
Start docker -d
No
Yes
Write SSH
host config
Fig. 3. Overview of script process
uses the Torque resource manager running on CentOS 6.5 and accepts jobs in a
PBS script format. The Intel-optimised parallel LINPACK version 11.2.1.009[4]
benchmark was used to measure the performance of 2 nodes with 4 cores each.
In order to validate the claim in [12] that for the non-optimised use case
there is no appreciable difference in the CPU execution performance between a
Docker container and a VM, the same benchmark was run using the reference
BLAS version 3.5.0 library without architecture specific optimisations[1].
The containers were orchestrated in both container per-core and container
per-node models, using the implementation described in the previous section.
The VMs were created to consume all available cores on the host and appropriate
memory to fit the LINPACK problem size.
5.1 Results
Figure 4 shows the experimental results using both the Intel-optimised parallel
LINPACK and generic BLAS library comparing native, Docker container models
and KVM performance. The results have been obtained from 10 runs of each
configuration. The peak performance observed per configuration is shown.
5.2 Evaluation
The LINPACK experimental results echo those obtained by previous research[12],
showing that the performance of Docker containers has no appreciable difference
compared to running natively, whilst the VM achieves approximately half the
peak performance of the container. However, this work differs significantly in
that the parallel LINPACK uses a distributed memory model, not shared mem-
ory, utilising a network interconnect for message passing to achieve data sharing
between processes.
Native Per node
Per processKVM
0
20
40
60
Gflops (peak)
Intel-optimised LINPACK
Native Per node
Per processKVM
0
1
2
3
4
Gflops (peak)
Generic BLAS LINPACK
Fig. 4. LINPACK benchmarking results
Without optimisation for the processor architecture, the performance of a
VM and container are mostly comparable. However, the overall peak perfor-
mance is considerably lower for this application. This suggests that the Docker
container is therefore a more appropriate execution method for high performance
parallel applications where we are likely to employ these types of optimisation.
There is no difference in performance between the two container orchestration
models proposed. This is expected given that a process within the container is
essentially a process running in the host kernel with ’cgroup’ limitations.
6 Summary
One of the requirements of grid computing is to run a job transparently to the
user on any resource they desire without requiring knowledge of the local software
configuration. Based on our research and experimental results conducted, it is
evident that Docker containers can facilitate this by abstracting the software
environment of the local HPC resource without compromising performance. This
improves Quality of Service for our users by
Allowing parallel jobs to run on traditional PBS clusters with arbitrary run
time environments
Reducing the entry level of customising the run time environment to that of
the average user
Running jobs on resources within the grid that was previously not possible
due to software configuration
7 Future Work
The container per process model offers many advantages by allowing us to apply
’cgroup’ constraints to each process in a HPC job. This would allow resource
management to be improved based on job requirements as more fine grained
control can be achieved for network and disk I/O usage in addition to CPU
time[14]. It also provides scope for optimising the power consumption of a job
as limits can be changed in real time without restarting the process.
In our future work we will perform benchmarking to appreciate the impact
of Docker’s NAT networking on message passing. We will also investigate or-
chestration of Docker containers that contain parallel applications in the cloud
environment as opposed to traditional cluster computing.
Acknowledgments. The experimental results for this work could not have
been obtained without the resources and support provided by the QueensGate
Grid (QGG) at The University of Huddersfield.
References
1. Blas (basic linear algebra subprograms). http://www.netlib.org/blas/.
2. Docker. https://www.docker.com/.
3. Dockerfile reference - docker documentation. https://docs.docker.com/
reference/builder/. Version 1.4.
4. Intel math kernel library linpack download. https://software.intel.com/en-us/
articles/intel-math-kernel-library- linpack-download.
5. Kernel based virtual machine. http://www.linux- kvm.org/.
6. Mpich high performance portable mpi. http://www.mpich.org/.
7. Open mpi: Open source high performance computing. http://www.open- mpi.org/.
8. Charlie Catlett. Standards for grid computing: Global grid forum. Journal of Grid
Computing, 1(1):3–7, 2003.
9. Rion Dooley. Agave docker quickstart. https://bitbucket.org/deardooley/
agave-docker-support/, 09 2014.
10. Rion Dooley, Matthew Vaughn, Dan Stanzione, Steve Terry, and Edwin Skidmore.
Software-as-a-service: The iplant foundation api. In 5th IEEE Workshop on Many-
Task Computing on Grids and Supercomputers, November 2012.
11. W. Emeneker and D. Stanzione. Dynamic virtual clustering. In Cluster Computing,
2007 IEEE International Conference on, pages 84–90, Sept 2007.
12. Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. An updated
performance comparison of virtual machines and linux containers. technology, 28:32,
2014.
13. Michael Kerrisk. Namespaces in operation, part 1: namespaces overview. http:
//lwn.net/Articles/531114/, 1 2014. Accessed 7 Feb 2015.
14. Paul Menage. Cgroups. https://www.kernel.org/doc/Documentation/cgroups/
cgroups.txt. Accessed 7 Feb 2015.
15. Jim Smith and Ravi Nair. Virtual Machines: Versatile Platforms For Systems And
Processes. Morgan Kaufmann, 2005.
16. Todd Tannenbaum. Htcondor and hep partnership and activities. Presented at the
HEPiX Fall 2014 Workshop, University of Nebraska, Lincoln, 13-17 October 2014,
2014.
17. Geoffroy Vallee, Thomas Naughton, and Stephen L Scott. System management
software for virtual environments. In Proceedings of the 4th international conference
on Computing frontiers, pages 153–160. ACM, 2007.
18. Aaron Weiss. Computing in the clouds. netWorker, 11(4):16–25, December 2007.
... El trabajo presentado en [1] describe una solución con contenedores Docker para encapsular el ambiente de los nodos de un clúster, que al igual que nuestra solución, ofrece auto escalamiento de nodos y permite que varias instancias del clúster pueden coexistir. Sin embargo, difiere de nuestra solución en los siguientes aspectos: (i) fue ejecutado y concebido para En [2] se realiza un análisis investigativo y exhaustivo sobre las implicaciones y decisiones técnicas, que conlleva desplegar un ambiente de HPC basado en contenedores de Docker, sin embargo plantean la implementación del concepto en un trabajo posterior [12]. Similar a nuestra propuesta, este trabajo, utiliza los contenedores de Docker para encapsular el ambiente de los nodos y un script de inicialización de los nodos. ...
... Los autores del trabajo presentado en [3], a diferencia del presentado en [2], determinaron en sus estudios (y en base a otros estudios previos referenciados en su trabajo) que sí existe un cuello de botella al ejecutar trabajos de MPI en ambientes de múltiples contenedores. Por lo que decidieron estudiar y diagnosticar las causas de este overhead, para luego desarrollar una librería que permita a los procesos de MPI comunicarse a través de memoria compartida, en lugar de por mensajes de red. ...
Article
Full-text available
Resumen: En ambientes de computación de alto rendimiento (HPC por sus siglas en inglés de High Performance Computing), una de las tareas más difíciles, tanto para los administradores de sistemas como para los usuarios finales, es preparar el ambiente de trabajo (e.g., configuración del clúster) y resolver las dependencias de software. En este contexto, este trabajo propone una solución de software, basada en la moderna tecnología de contenedores Docker, para construir un clúster HPC virtual con computadores personales de manera simple y automática, particularmente para trabajar bajo el paradigma de paso de mensajes con el estándar MPI (Message Passing Interface). La solución de software propuesta, se despliega de forma automatizada en virtualmente cualquier ambiente que soporte la instalación de la herramienta Docker, permite cubrir múltiples casos de uso y provee flexibilidad en cuanto a requerimientos de software, hardware y disponibilidad de equipos de cómputo. Para demostrar la flexibilidad y conveniencia de la solución propuesta, en este trabajo presentamos dos versiones de instalación: uno con el sistema operativo Alpine y librería de comunicación MPICH, y el segundo utiliza el sistema operativo Ubuntu y la librería de comunicación OpenMPI. Para evaluar la efectividad de la solución propuesta, se realizan pruebas con ambas versiones en dos ambientes de prueba: usando una red local (con tres computadores conectados con Ethernet) y usando cinco nodos virtualizados. Los resultados demuestran que la solución desarrollada puede ser desplegada eficientemente en distintos ambientes (real y virtualizado), lo que la hace ideal para fines académicos. Palabras Clave: Cluster HPC; Contenedores; Docker; Instalación Automática; MPI. Abstract: On High Performance Computing (HPC) environments, one of the hardest tasks for system administrators and developers, is to prepare their working environment (e.g., building and configuring a cluster) and resolve software dependencies. In this context, this work proposes a software solution based on the modern Docker container technology, to build a virtual HPC cluster in a simple and automated way using personal computers, particularly to work with the MPI (Message Passing Interface) standard. Our proposed software solution, can be deployed in an automated way in any environment that supports the Docker run time installation. It allows to cover multiple use cases and provides flexibility in terms of software requirements, hardware, and computer equipment availability. To demonstrate the flexibility and convenience of the proposed solution, we develop two versions of possible installation: one with Alpine operating system and MPICH and the second one with Ubuntu operating system and OpenMPI. To evaluate the effectiveness of the proposed solution, we test both versions in two different scenarios: one built with a local network (with three computers connect through Ethernet) and the other one with five nodes in virtual machines. Results demonstrate that the developed solution can be deployed efficiently across different environments (real and virtualized), which makes it ideal for academic purposes.
... In (Kominos et al. 2017), the authors present an overview of Docker container and virtual machine performance evaluation in terms of CPU performance, memory performance, disk I/O, and operational speed measurement. The authors in (Higgins et al, 2015) focused on implementing Docker containers on HPC Cluster. In the last part of the document, the authors explain the different implementation approaches to choosing the container model and the use of LNPACK and BLAS. ...
Article
Server virtualization is a technological innovation widely used in information technology (IT) companies. Virtualization provides a platform to run different operating system services in the cloud. It makes it easy to build multiple virtual machines on a single basic physical machine, either in the form of hypervisors or containers. To host many microservices applications, emerging technology has introduced a model consisting of different operations performed by smaller individual deployed services. Therefore, the demand for low-cost virtualization techniques is developing rapidly. There are many lightweight virtualization technologies; Docker is one of them, which is an open-source platform. This technology enables developers and system administrators to build, create, and run applications using the Docker engine. This document provides performance evaluation of Docker containers and virtual machines using standard benchmark tools such as Sysbench, Phoronix, and Apache, including CPU performance, memory performance, storage read/write performance, load test and measurement of operating speed.
Article
One of the most important issues in the path to the convergence of HPC and Big Data is caused by the differences in their software stacks. Despite some research efforts, the interoperability between their programming models and languages is still limited. To deal with this problem we introduce a new computing framework called IgnisHPC, whose main objective is to unify the execution of Big Data and HPC workloads in the same framework. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Since MPI was used as its backbone technology, IgnisHPC takes advantage of many communication models and network architectures. Moreover, MPI applications can be directly executed in an efficient way in the framework. The main consequence is that users could combine in the same multi-language code HPC tasks (using MPI) with Big Data tasks (using MapReduce operations). The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity with respect to other frameworks. IgnisHPC is publicly available for the Big Data and HPC research community.
Conference Paper
Full-text available
The use of 3-Dimensional (3D) printing, known as Digital fabrication (DF) or additive manufacturing (AM), technology in the food sector has countless potential to fabricate 3D constructs with complex geometries, customization, and on-demand production. For this reason, 3D technology is driving major innovations in the food industry. This paper presents the construction of a chocolate 3D printer by applying the pressure pump technique using chocolate as a printing material. Here the conventional 3D printer’s design was developed as a chocolate 3D printer. As an improvement, a new extruder mechanism was introduced. The extruder was developed to print the chocolate materials. In the working mechanism, the 3D printer reads the design instruction and chocolate material is extruding accordingly, through the nozzle of the pump to the bed of the 3D printer followed by the design (layer by layer). The special part of this chocolate 3D printer is the pressure pump in the extruder part. That pressure pump provides pressure on melted chocolate from the chocolate container to the nozzle point. The usability and efficiency of the 3D printer were tested with sample designs. The obtained results were presented and discussed. Together with these advances this 3D printer can be used to produce complex food models and design unique patterns in chocolate-based sweets by satisfying customers.
Chapter
Containerization demonstrates its efficiency in application deployment in Cloud clusters. HPC systems start to adopt containers, as containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We enable the synergy of Cloud and HPC clusters. We propose the preliminary design of a feedback control scheduler that performs efficient container scheduling meanwhile taking advantage of the scheduling policies of the container orchestrator (Kubernetes) and the HPC workload manager (TORQUE).
Article
Powerful services and applications are being integrated and packaged on the Web in what the industry now calls "cloud computing"
Conference Paper
The iPlant Foundation API (fAPI) is a hosted, Software-as-a-Service (SaaS) resource for the computational biology field. Unlike many other grid-based approaches to distributed infrastructure, the fAPI provides a holistic view of core computing concepts such as security, data, applications, jobs, and systems. It also provides the support services, such as unified accounting, provenance, metadata, and global events, needed to bind the core concepts together into a cohesive interface. Operating as a set of RESTful web services, the fAPI bridges the gap between the HPC and web worlds by supporting synchronous and asynchronous interfaces, web-based callbacks, unified access control lists (ACL), and a publication- subscription (pubsub) model that allows modern applications to interact with the underlying infrastructure using technologies and access patterns familiar to them. In its first year of operation, the fAPI has grown to support thousands of users across both the Plant and Animal Science domains. In this paper we describe the fAPI, its underlying architecture, and briefly describe its adoption before concluding with future plans.
Conference Paper
Recently there has been an increased interest in the use of system-level virtualization using mature solutions such as Xen, QEMU, or VMWare. These virtualization platforms are being used in distributed and parallelenvironments including high performance computing. The use of virtual machines within such environments introduces newchallenges to system management. These include tedious tasks such as deploying para-virtualized host operating systems to support virtual machine execution or virtual overlay networks to connect these virtual machines. Additionally, there is the problem of machine definition and deployment,which is complicated by differentiation in the underlying virtualizationtechnology. This paper discusses tools for the deployment and management of both hostoperating systems and virtual machines in clusters. We begin with an overview of system-level virtualization and move on to a description of tools that we have developed to aid with these environments. These tools extend prior work in the area of cluster installation, configuration and management.
Article
Formed in 1999, Global Grid Forum (GGF) is an international organization focused on the development of common practices, agreements, and specifications that will promote interoperability and reuse of Grid technologies. To this end, GGF supports a recommendations process modeled after the Internet Standards Process and carried out through working groups. To ensure that the recommendations process operates with understanding about both research directions and end user requirements, GGF also operates a set of research groups. Finally, GGF supports a set of educational and informational activities designed to support and expand the community of Grid developers, researchers, practitioners and end-users.
Conference Paper
Multiple clusters co-existing in a single research campus has become commonplace at many university and government labs, but effectively leveraging those resources is difficult. Intelligently forwarding and spanning jobs across clusters can increase throughput, decrease turnaround time, and improve overall utilization. Dynamic Virtual Clustering (DVC) is a system of virtual machines, deployed in a single or multi-cluster environment, to increase cluster utilization by enabling job forwarding and spanning, flexibly allow software environment changes, and effectively sandbox users and processes from each other and the system. This paper presents both the initial implementation of DVC and performance results from synthetic workloads executed under DVC.
Namespaces in operation, part 1: namespaces overview
  • M Kerrisk
Agave docker quickstart
  • Rion Dooley
Rion Dooley. Agave docker quickstart. https://bitbucket.org/deardooley/ agave-docker-support/, 09 2014.
Cgroups. https:// www. kernel. org/ doc/ Documentation/ cgroups/ cgroups
  • P Menage