Conference PaperPDF Available

Orchestrating Docker Containers in the HPC Environment

Authors:

Abstract and Figures

Linux container technology has more than proved itself useful in cloud computing as a lightweight alternative to virtualisation, whilst still offering good enough resource isolation. Docker is emerging as a popular runtime for managing Linux containers, providing both management tools and a simple file format. Research into the performance of containers compared to traditional Virtual Machines and bare metal shows that containers can achieve near native speeds in processing, memory and network throughput. A technology born in the cloud, it is making inroads into scientific computing both as a format for sharing experimental applications and as a paradigm for cloud based execution. However, it has unexplored uses in traditional cluster and grid computing. It provides a run time environment in which there is an opportunity for typical cluster and parallel applications to execute at native speeds, whilst being bundled with their own specific (or legacy) library versions and support software. This offers a solution to the Achilles heel of cluster and grid computing that requires the user to hold intimate knowledge of the local software infrastructure. Using Docker brings us a step closer to more effective job and resource management within the cluster by providing both a common definition format and a repeatable execution environment. In this paper we present the results of our work in deploying Docker containers in the cluster environment and an evaluation of its suitability as a runtime for high performance parallel execution. Our findings suggest that containers can be used to tailor the run time environment for an MPI application without compromising performance, and would provide better Quality of Service for users of scientific computing.
Content may be subject to copyright.
Orchestrating Docker Containers in the HPC
Environment
Joshua Higgins, Violeta Holmes, Colin Venters
The University of Huddersfield,
Queensgate, Huddersfield, UK
{joshua.higgins, v.holmes, c.venters}@hud.ac.uk
Abstract. Linux container technology has more than proved itself use-
ful in cloud computing as a lightweight alternative to virtualisation,
whilst still offering good enough resource isolation. Docker is emerging as
a popular runtime for managing Linux containers, providing both man-
agement tools and a simple file format. Research into the performance
of containers compared to traditional Virtual Machines and bare metal
shows that containers can achieve near native speeds in processing, mem-
ory and network throughput. A technology born in the cloud, it is making
inroads into scientific computing both as a format for sharing experimen-
tal applications and as a paradigm for cloud based execution.However,
it has unexplored uses in traditional cluster and grid computing. It pro-
vides a run time environment in which there is an opportunity for typical
cluster and parallel applications to execute at native speeds, whilst being
bundled with their own specific (or legacy) library versions and support
software. This offers a solution to the Achilles heel of cluster and grid
computing that requires the user to hold intimate knowledge of the local
software infrastructure. Using Docker brings us a step closer to more ef-
fective job and resource management within the cluster by providing both
a common definition format and a repeatable execution environment. In
this paper we present the results of our work in deploying Docker con-
tainers in the cluster environment and an evaluation of its suitability as
a runtime for high performance parallel execution. Our findings suggest
that containers can be used to tailor the run time environment for an
MPI application without compromising performance, and would provide
better Quality of Service for users of scientific computing.
Keywords: Linux containers, Docker, cluster, grids, run time environ-
ment
1 Introduction
Cloud computing has been driven by the Virtual Machine (VM). They are widely
deployed to achieve performance and resource isolation for workloads by con-
straining the amount of virtual memory and processor cores available to a guest
system. This allows resource sharing on a massive scale; VMs can be provisioned
with any software environment and kept separate from other guests on the same
physical server[18].
Linux container technology is classed as an operating system virtualisation
method. It allows the creation of separate userspace instances in which the same
kernel is shared.This provides functionality similar to a VM but with a lighter
footprint. The Docker project provides a management tool and its own library
for communicating with containment features in the OS kernel[2].
Resource isolation takes a back seat in HPC systems which generally execute
a user’s job within the same OS environment that runs directly on the hardware
to gain the best performance. This poses a problem for application portability,
especially in grid systems where a remote resource may lack the libraries or sup-
port software required by a job. This undermines efforts by middleware vendors
to unify resources and provide a common format for accessing heterogeneous
systems[8]. In this respect some features of the cloud are desirable in cluster
computing.
Docker containers offer an opportunity to create cloud-like flexibility in the
cluster without incurring the performance limitations of a VM. This paper in-
vestigates the performance gains that can be achieved using Docker containers
for executing parallel applications when compared to the KVM hypervisor[5] in
a typical scientific cluster computing environment. We also propose a method of
executing an MPI job encapsulated in a Docker container through the cluster
resource manager.
In the sections of this paper, a short review of Docker containers versus
Virtual Machines will be conducted. The current work will be then discussed.
Section 4 describes the proposed Docker in the HPC cluster solution. The results
from the deployment of this implementation will be evaluated and future work
identified.
2 Docker vs KVM
2.1 Architecture
KVM is a popular hypervisor for Linux that introduces virtualisation support
into the Linux kernel. The hypervisor provides an illusion to the guest OS that
it is managing it’s own hardware resources [15]. A request from the guest must
be translated into a request to the underlying physical hardware; a process in
modern hypervisors that is highly optimised and transparent to the guest. This
allows the hypervisor to host a VM without modifications to the OS that has
been chosen.
Docker containers do not require a layer of translation. On Linux, they are
implemented using ’cgroups’; a feature in the Linux kernel that allows the re-
sources (such as CPU, memory and network) consumed by a process to be con-
strained[14]. The processes can then be isolated from each other using kernel
’namespaces’[13]. This fundamental difference requires that the guest system
processes are executed by the host kernel, restricting containers on a Linux host
to only other Linux flavours. However, it means that an executable within the
container system essentially runs with no additional overhead compared to an
executable in the host OS. A container is not required to perform any system
initialisation - its process tree could contain just the program being run and any
other programs or services that it depends on.
2.2 Performance
A benchmark by IBM Research demonstrates that the performance of a Docker
container equals or exceeds KVM performance in CPU, memory and network
benchmarks [12]. The results show that both methods do not introduce a mea-
surable overhead for CPU and memory performance. However the Linpack per-
formance inside KVM is shown to be very poor; the hypervisor abstracts the
hardware and processor topology which does not allow tuning and optimisation
to take place.
They also suggest that the preferential scaling topology for Docker containers
is by processor socket. By not allowing a container to span cores distributed over
multiple processor sockets, it avoids expensive inter-processor bus communica-
tion. This is in line with the philosophy already ingrained in cluster computing
applications in which a process per core is executed on compute nodes. However,
the effect may not be appreciable in distributed memory applications where the
bandwidth of the network interconnect may be many orders of magnitude slower.
2.3 Storage
A Virtual Machine is traditionally accompanied by a disk image which holds the
OS and run time applications. To create or modify this disk image would require
the user to hold the knowledge of installing the OS, or systems management
experience. The resulting image file may span several gigabytes. This places a
large burden on storage and may be inconvenient to transfer between systems.
The Docker project introduces a concept of a ’Dockerfile’ which allows the
userspace of a container to be described as a list of directives in a text file that
construct the real image[3]. Each directive produces a layer of the final system
image, which are combined at run time using a copy-on-write method to appear
to the container as a single unified image. This allows multiple containers to
share common image layers, potentially reducing the amount of data required to
transfer between systems. The Dockerfile itself is significantly easier to customise
and can be easily version controlled or shared.
3 Current Work
The flexibility of the cloud allows one to create multiple clusters where each
individual virtual cluster can have it’s own customised software environment.
This draws parallels with the development of the now defunct OSCAR-V mid-
dleware[17] and Dynamic Virtual Clustering[11] concepts, which are able to pro-
vision a virtual cluster based on the requirements of a job at the time of submis-
sion. These systems still incur the overhead of running a VM as the job execution
environment and inherit the associated performance limitations.
The Agave API is a gateway platform that provides services typically found
in grid systems tailored for scientific users. It has a strong focus on web standards
and boasts support for running applications packaged in Docker containers in a
cloud[10]. However, Agave orchestrates a single container per job, which limits
the scope for running parallel applications[9].
HTCondor is a middleware for high-throughput computing that has support
for running jobs in parallel on dedicated machines, using ’universes’ to distin-
guish between different execution paradigms. HTCondor itself already supports
’cgroups’ to constrain the resources available to a job on Linux hosts and a
universe is under development to provide support for the Docker container for-
mat[16]. HTCondor has powerful resource discovery features but is the usefulness
of a container in not needing to know?
4 Docker in the HPC Cluster
To implement existing container and VM solutions in HPC systems requires
modifications to the software stack of the local HPC resources. The HPC sys-
tems would already have resource management and job scheduling in place. The
methodology should follow the concept of containers, that is to abstract the ap-
plication from the software stack of the resource. Modifying core components of
this stack to support containers introduces a portability problem. Any standard
resource manager provides all the information required to orchestrate a Docker
container within a resource.
A resource manager such as Torque uses a script that is the core of the job
execution and is responsible for configuring the environment and passing the
required information to a process starter, such as ’mpirun’. We use both the
MPICH[6] and OpenMPI[7] launchers regularly in our systems, which support
SSH to enable remote process execution.
4.1 Choosing a container model
Since we concluded that running a container has no appreciable overhead over
running a normal executable, we propose two different container models for
parallel execution. A single container could be started to mimic the worker node,
as shown in Figure 1, which would hold all the processes assigned to that node.
Secondly, a container per process could also be orchestrated as shown in
Figure 2. Whilst it is unlikely that processes within the same job would require
different run time environments, this presents an interesting opportunity for
resource usage accounting and can offer real time adjustment of the resource
constraints per process through ’cgroups’. It can also enforce a process to be
mapped to a specific processor core if this functionality is missing from the
process launcher.
Fig. 1. A container per node that holds all respective processes
Fig. 2. A container per process is orchestrated on each node
4.2 Container model implementation
The resource manager will not be aware of any containers, so the script that
is central to the job execution will be used to prepare the nodes and start the
containers, before the process launcher starts the parallel job. An overview of
this process is described in Figure 3. We cannot assume that the user is able to
SSH to the worker nodes to run the container preparation commands. In this
case the bootstrap script can also be invoked through the process launcher first
and then the process launcher is called a second time to invoke the parallel job.
The overview of the script process supports both container models.
When the process launcher is called, the environment has been modified in
two ways: the container will randomise it’s SSH port and expose it on the same
interface as the host. The SSH configuration is temporarily changed so that
the process launcher will use this port instead of the default, thereby launching
the process within the container and not on the node. However, if the process
launcher forks the process on the rank 0 node instead of using SSH, the process
will run outside the container. To avoid this condition, the script will write a
unique hostname alias for each node into the SSH configuration that maps to
the container on that node. These are substituted into the list of nodes provided
by the resource manager before being passed to the process launcher.
5 Deployment of container model on HPC cluster
Eridani is the general purpose, 136 core cluster within the QueensGate Grid at
the University of Huddersfield. Like the other clusters within this campus grid, it
Start PBS
script
Process
launcher
Stop
containers
on nodes
PBS script
finished
SSH to next
worker Is Docker
running? Pull image and
start container
Expose SSH
on random port
Is last
worker
node?
Create
container user
Yes
No
Start docker -d
No
Yes
Write SSH
host config
Fig. 3. Overview of script process
uses the Torque resource manager running on CentOS 6.5 and accepts jobs in a
PBS script format. The Intel-optimised parallel LINPACK version 11.2.1.009[4]
benchmark was used to measure the performance of 2 nodes with 4 cores each.
In order to validate the claim in [12] that for the non-optimised use case
there is no appreciable difference in the CPU execution performance between a
Docker container and a VM, the same benchmark was run using the reference
BLAS version 3.5.0 library without architecture specific optimisations[1].
The containers were orchestrated in both container per-core and container
per-node models, using the implementation described in the previous section.
The VMs were created to consume all available cores on the host and appropriate
memory to fit the LINPACK problem size.
5.1 Results
Figure 4 shows the experimental results using both the Intel-optimised parallel
LINPACK and generic BLAS library comparing native, Docker container models
and KVM performance. The results have been obtained from 10 runs of each
configuration. The peak performance observed per configuration is shown.
5.2 Evaluation
The LINPACK experimental results echo those obtained by previous research[12],
showing that the performance of Docker containers has no appreciable difference
compared to running natively, whilst the VM achieves approximately half the
peak performance of the container. However, this work differs significantly in
that the parallel LINPACK uses a distributed memory model, not shared mem-
ory, utilising a network interconnect for message passing to achieve data sharing
between processes.
Native Per node
Per processKVM
0
20
40
60
Gflops (peak)
Intel-optimised LINPACK
Native Per node
Per processKVM
0
1
2
3
4
Gflops (peak)
Generic BLAS LINPACK
Fig. 4. LINPACK benchmarking results
Without optimisation for the processor architecture, the performance of a
VM and container are mostly comparable. However, the overall peak perfor-
mance is considerably lower for this application. This suggests that the Docker
container is therefore a more appropriate execution method for high performance
parallel applications where we are likely to employ these types of optimisation.
There is no difference in performance between the two container orchestration
models proposed. This is expected given that a process within the container is
essentially a process running in the host kernel with ’cgroup’ limitations.
6 Summary
One of the requirements of grid computing is to run a job transparently to the
user on any resource they desire without requiring knowledge of the local software
configuration. Based on our research and experimental results conducted, it is
evident that Docker containers can facilitate this by abstracting the software
environment of the local HPC resource without compromising performance. This
improves Quality of Service for our users by
Allowing parallel jobs to run on traditional PBS clusters with arbitrary run
time environments
Reducing the entry level of customising the run time environment to that of
the average user
Running jobs on resources within the grid that was previously not possible
due to software configuration
7 Future Work
The container per process model offers many advantages by allowing us to apply
’cgroup’ constraints to each process in a HPC job. This would allow resource
management to be improved based on job requirements as more fine grained
control can be achieved for network and disk I/O usage in addition to CPU
time[14]. It also provides scope for optimising the power consumption of a job
as limits can be changed in real time without restarting the process.
In our future work we will perform benchmarking to appreciate the impact
of Docker’s NAT networking on message passing. We will also investigate or-
chestration of Docker containers that contain parallel applications in the cloud
environment as opposed to traditional cluster computing.
Acknowledgments. The experimental results for this work could not have
been obtained without the resources and support provided by the QueensGate
Grid (QGG) at The University of Huddersfield.
References
1. Blas (basic linear algebra subprograms). http://www.netlib.org/blas/.
2. Docker. https://www.docker.com/.
3. Dockerfile reference - docker documentation. https://docs.docker.com/
reference/builder/. Version 1.4.
4. Intel math kernel library linpack download. https://software.intel.com/en-us/
articles/intel-math-kernel-library- linpack-download.
5. Kernel based virtual machine. http://www.linux- kvm.org/.
6. Mpich high performance portable mpi. http://www.mpich.org/.
7. Open mpi: Open source high performance computing. http://www.open- mpi.org/.
8. Charlie Catlett. Standards for grid computing: Global grid forum. Journal of Grid
Computing, 1(1):3–7, 2003.
9. Rion Dooley. Agave docker quickstart. https://bitbucket.org/deardooley/
agave-docker-support/, 09 2014.
10. Rion Dooley, Matthew Vaughn, Dan Stanzione, Steve Terry, and Edwin Skidmore.
Software-as-a-service: The iplant foundation api. In 5th IEEE Workshop on Many-
Task Computing on Grids and Supercomputers, November 2012.
11. W. Emeneker and D. Stanzione. Dynamic virtual clustering. In Cluster Computing,
2007 IEEE International Conference on, pages 84–90, Sept 2007.
12. Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. An updated
performance comparison of virtual machines and linux containers. technology, 28:32,
2014.
13. Michael Kerrisk. Namespaces in operation, part 1: namespaces overview. http:
//lwn.net/Articles/531114/, 1 2014. Accessed 7 Feb 2015.
14. Paul Menage. Cgroups. https://www.kernel.org/doc/Documentation/cgroups/
cgroups.txt. Accessed 7 Feb 2015.
15. Jim Smith and Ravi Nair. Virtual Machines: Versatile Platforms For Systems And
Processes. Morgan Kaufmann, 2005.
16. Todd Tannenbaum. Htcondor and hep partnership and activities. Presented at the
HEPiX Fall 2014 Workshop, University of Nebraska, Lincoln, 13-17 October 2014,
2014.
17. Geoffroy Vallee, Thomas Naughton, and Stephen L Scott. System management
software for virtual environments. In Proceedings of the 4th international conference
on Computing frontiers, pages 153–160. ACM, 2007.
18. Aaron Weiss. Computing in the clouds. netWorker, 11(4):16–25, December 2007.
... In recent years, there exist research efforts on efficiently orchestrating HPC jobs on cloud clusters [128]- [131]. These state-of-theart studies on HPC job orchestration can be divided to four categories: added functionalities to HPC workload managers [132]- [134], connector between cloud and HPC [96], [135]- [137], cohabitation [99], [138], [139], meta-orchestration [140]- [142]. For added functionalities to HPC workload managers, it mainly extends HPC workload manager to support container orchestrator for HPC application. ...
Preprint
Full-text available
p>The development of cloud computing delivery mod- els inspires the emergence of cloud-native computing. Cloud- native computing, as the most influential development principle for web applications, has already attracted increasingly more at- tention in both industry and academia. Despite the momentum in the cloud-native industrial community, a clear research roadmap on this topic is still missing. As a contribution to this knowledge, this paper surveys key issues during the life-cycle of cloud-native applications, from the perspective of services. Specifically, we elaborate the research domains by decoupling the life-cycle of cloud-native applications into four states: building, orchestration, operate, and maintenance. We also discuss the fundamental necessities and summarize the key performance metrics that play critical roles during the development and management of cloud-native applications. We highlight the key implications and limitations of existing works in each state. The challenges, future directions, and research opportunities are also discussed. </p
... In recent years, there exist research efforts on efficiently orchestrating HPC jobs on cloud clusters [128]- [131]. These state-of-theart studies on HPC job orchestration can be divided to four categories: added functionalities to HPC workload managers [132]- [134], connector between cloud and HPC [96], [135]- [137], cohabitation [99], [138], [139], meta-orchestration [140]- [142]. For added functionalities to HPC workload managers, it mainly extends HPC workload manager to support container orchestrator for HPC application. ...
Preprint
Full-text available
p>The development of cloud computing delivery mod- els inspires the emergence of cloud-native computing. Cloud- native computing, as the most influential development principle for web applications, has already attracted increasingly more at- tention in both industry and academia. Despite the momentum in the cloud-native industrial community, a clear research roadmap on this topic is still missing. As a contribution to this knowledge, this paper surveys key issues during the life-cycle of cloud-native applications, from the perspective of services. Specifically, we elaborate the research domains by decoupling the life-cycle of cloud-native applications into four states: building, orchestration, operate, and maintenance. We also discuss the fundamental necessities and summarize the key performance metrics that play critical roles during the development and management of cloud-native applications. We highlight the key implications and limitations of existing works in each state. The challenges, future directions, and research opportunities are also discussed. </p
... They are lighter, faster to boot, and scale better than virtual machines (VM) because they share a host OS kernel and only virtualize userspaces [48,5]. Originally conceived for cloud-native applications [34], they are now the de-facto compute unit in many contexts -for example, Internet of Things (IoT) [9] and High-Performance Computing (HPC) [21] solutions. Containers mainly exploit the features of the Linux kernel, but there are also a few implementations for other operating systems [28]. ...
Preprint
Containerization is a virtualization technique that allows one to create and run executables consistently on any infrastructure. Compared to virtual machines, containers are lighter since they do not bundle a (guest) operating system but they share its kernel, and they only include the files, libraries, and dependencies that are required to properly execute a process. In the past few years, multiple container engines (i.e., tools for configuring, executing, and managing containers) have been developed ranging from some that are ``general purpose'', and mostly employed for Cloud executions, to others that are built for specific contexts, namely Internet of Things and High-Performance Computing. Given the importance of this technology for many practitioners and researchers, this paper analyses six state-of-the-art container engines and compares them through a comprehensive study of their characteristics and performance. The results are organized around 10 findings that aim to help the readers understand the differences among the technologies and help them choose the best approach for their needs.
Chapter
Today, Cloud and HPC workloads tend to use different approaches for managing resources. However, as more and more applications require a mixture of both high-performance and data processing computation, convergence of Cloud and HPC resource management is becoming a necessity. Cloud-oriented resource management strives to share physical resources across applications to improve infrastructure efficiency. On the other hand, the HPC community prefers to rely on job queueing mechanisms to coordinate among tasks, favoring dedicated use of physical resources by each application.In this paper, we design a combined Slurm-Kubernetes system that is able to run unmodified HPC workloads under Kubernetes, alongside other, non-HPC applications. First, we containerize the whole HPC execution environment into a virtual cluster, giving each user a private HPC context, with common libraries and utilities built-in, like the Slurm job scheduler. Second, we design a custom Slurm-Kubernetes protocol that allows Slurm to dynamically request resources from Kubernetes. Essentially, in our system the Slurm controller delegates placement and scheduling decisions to Kubernetes, thus establishing a centralized resource management endpoint for all available resources. Third, our custom Kubernetes scheduler applies different placement policies depending on the workload type. We evaluate the performance of our system compared to a native Slurm-based HPC cluster and demonstrate its ability to allow the joint execution of applications with seemingly conflicting requirements on the same infrastructure with minimal interference.KeywordsCloud-native HPCKubernetes schedulingSlurm
Article
Containers improve the efficiency in application deployment and thus have been widely utilised on Cloud and lately in High Performance Computing (HPC) environments. Containers encapsulate complex programs with their dependencies in isolated environments making applications more compatible and portable. Often HPC systems have higher security levels compared to Cloud systems, which restrict users' ability to customise environments. Therefore, containers on HPC need to include a heavy package of libraries making their size relatively large. These libraries usually are specifically optimised for the hardware, which compromises portability of containers. Per contra , a Cloud container has smaller volume and is more portable. Furthermore, containers would benefit from orchestrators that facilitate deployment and management of containers at a large scale. Cloud systems in practice usually incorporate sophisticated container orchestration mechanisms as opposed to HPC systems. Nevertheless, some solutions to enable container orchestration on HPC systems have been proposed in state of the art. This paper gives a survey and taxonomy of efforts in both containerisation and its orchestration strategies on HPC systems. It highlights differences thereof between Cloud and HPC. Lastly, challenges are discussed and the potentials for research and engineering are envisioned.
Article
Full-text available
Docker containerization is a newly emerging technology, which brings virtualization to software applications. Specifically, lightweight-ness has brought higher advantage for the Docker containers. Often, the deployment of distributed net centric applications on cloud hosted Docker containers is a new approach for enterprise applications. This emerging approach has been discussed on many online forums, but currently there is no secondary research study to consolidate this approach. This research study aims to identify, systematically compare and taxonomically classify the existing research studies and their practical approaches. The systematic study was conducted of fifty-seven (57) selected research studies. Selected research studies were classified and compared based on the characterization framework. The study brings the results in a discussion of emerging concerns in the Docker based deployments. Furthermore, the study is moving closer to current concerns in distributed, microservices based net centric software applications
Article
One of the most important issues in the path to the convergence of HPC and Big Data is caused by the differences in their software stacks. Despite some research efforts, the interoperability between their programming models and languages is still limited. To deal with this problem we introduce a new computing framework called IgnisHPC, whose main objective is to unify the execution of Big Data and HPC workloads in the same framework. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Since MPI was used as its backbone technology, IgnisHPC takes advantage of many communication models and network architectures. Moreover, MPI applications can be directly executed in an efficient way in the framework. The main consequence is that users could combine in the same multi-language code HPC tasks (using MPI) with Big Data tasks (using MapReduce operations). The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity with respect to other frameworks. IgnisHPC is publicly available for the Big Data and HPC research community.
Chapter
Data centers are the beating heart of Internet and cloud. Virtualization is one of the major technologies in data centers. Cloud computing is slower as compared with the on-premises or native environments. But its advantages weigh much more. Virtualization technologies have seen constant rise in recent years with cloud is booming the IT industry like never before. Applications are deployed using containers and virtual machines with faster and secure deployment environments. In the current industry, we are seeing containers and virtual machines as two major technologies to run application while reducing the wastage of computing resources. Performance and security of applications running in these environments are compared in our experimental work with parameters like CPU performance, file I/O, and memory utilization are compared among both the virtualization technologies. Apart from above parameters, we have also used HTTP load from different continents that is used to showcase how the application running on container and virtual machine perform with traffic around the world.KeywordsDockerKVMAmazon Web servicesHTTP loadMicroservicesApache Sysbench
Article
Powerful services and applications are being integrated and packaged on the Web in what the industry now calls "cloud computing"
Conference Paper
The iPlant Foundation API (fAPI) is a hosted, Software-as-a-Service (SaaS) resource for the computational biology field. Unlike many other grid-based approaches to distributed infrastructure, the fAPI provides a holistic view of core computing concepts such as security, data, applications, jobs, and systems. It also provides the support services, such as unified accounting, provenance, metadata, and global events, needed to bind the core concepts together into a cohesive interface. Operating as a set of RESTful web services, the fAPI bridges the gap between the HPC and web worlds by supporting synchronous and asynchronous interfaces, web-based callbacks, unified access control lists (ACL), and a publication- subscription (pubsub) model that allows modern applications to interact with the underlying infrastructure using technologies and access patterns familiar to them. In its first year of operation, the fAPI has grown to support thousands of users across both the Plant and Animal Science domains. In this paper we describe the fAPI, its underlying architecture, and briefly describe its adoption before concluding with future plans.
Conference Paper
Recently there has been an increased interest in the use of system-level virtualization using mature solutions such as Xen, QEMU, or VMWare. These virtualization platforms are being used in distributed and parallelenvironments including high performance computing. The use of virtual machines within such environments introduces newchallenges to system management. These include tedious tasks such as deploying para-virtualized host operating systems to support virtual machine execution or virtual overlay networks to connect these virtual machines. Additionally, there is the problem of machine definition and deployment,which is complicated by differentiation in the underlying virtualizationtechnology. This paper discusses tools for the deployment and management of both hostoperating systems and virtual machines in clusters. We begin with an overview of system-level virtualization and move on to a description of tools that we have developed to aid with these environments. These tools extend prior work in the area of cluster installation, configuration and management.
Article
Formed in 1999, Global Grid Forum (GGF) is an international organization focused on the development of common practices, agreements, and specifications that will promote interoperability and reuse of Grid technologies. To this end, GGF supports a recommendations process modeled after the Internet Standards Process and carried out through working groups. To ensure that the recommendations process operates with understanding about both research directions and end user requirements, GGF also operates a set of research groups. Finally, GGF supports a set of educational and informational activities designed to support and expand the community of Grid developers, researchers, practitioners and end-users.
Conference Paper
Multiple clusters co-existing in a single research campus has become commonplace at many university and government labs, but effectively leveraging those resources is difficult. Intelligently forwarding and spanning jobs across clusters can increase throughput, decrease turnaround time, and improve overall utilization. Dynamic Virtual Clustering (DVC) is a system of virtual machines, deployed in a single or multi-cluster environment, to increase cluster utilization by enabling job forwarding and spanning, flexibly allow software environment changes, and effectively sandbox users and processes from each other and the system. This paper presents both the initial implementation of DVC and performance results from synthetic workloads executed under DVC.
Namespaces in operation
  • M Kerrisk
Agave docker quickstart
  • R Dooley
Rion Dooley. Agave docker quickstart. https://bitbucket.org/deardooley/ agave-docker-support/, 09 2014.
Cgroups. https:// www. kernel. org/ doc/ Documentation/ cgroups/ cgroups
  • P Menage