Technical ReportPDF Available

VAST: Integrating Volunteer Computing into Existing Cyberinfrastructure

Authors:
  • Dell Technologies

Abstract and Figures

We describe the goals of the VAST project, which seeks to integrate academic high performance computing , academic cloud, and BOINC-based volunteer computing capabilities into a unified capability for executing high throughput computing workloads. The VAST project will also integrate volunteer computing with science gateways and APIs, enabling more researchers to take advantage of volunteer resources. We describe changes to BOINC that support VAST by allowing volunteers to support science areas rather than specific project, and that provide a central mechanism for allocation computing power to projects. We also describe Herd, a VAST resource at TACC which serves as a prototype and development platform for the VAST project's tools and APIs.
Content may be subject to copyright.
VAST: Integrating Volunteer Computing into Existing
Cyberinfrastructure
Lucas A. Wilson
Texas Advanced Computing Center
The University of Texas at Austin
lucaswilson@acm.org
David P. Anderson
Space Sciences Laboratory
University of California Berkeley
davea@ssl.berkeley.edu
July 31, 2017
Abstract
We describe the goals of the VAST project, which
seeks to integrate academic high performance com-
puting, academic cloud, and BOINC-based volunteer
computing capabilities into a unified capability for ex-
ecuting high throughput computing workloads. The
VAST project will also integrate volunteer comput-
ing with science gateways and APIs, enabling more
researchers to take advantage of volunteer resources.
We describe changes to BOINC that support
VAST by allowing volunteers to support science ar-
eas rather than specific project, and that provide a
central mechanism for allocation computing power to
projects. We also describe Herd, a VAST resource at
TACC which serves as a prototype and development
platform for the VAST pro ject’s tools and APIs.
1 Introduction
Volunteer computing (VC) uses consumer digital
products, such as PCs, mobile devices, and game con-
soles, for high-throughput computing (HTC). Peo-
ple volunteer because they want to support scientific
goals, compete based on computing power, and par-
ticipate in online communities [16].
Most VC projects use BOINC, an open-source mid-
dleware system [4]. Using the BOINC server soft-
ware, scientists create and operate VC “projects.”
Volunteers participate in these projects by installing
the BOINC client and choosing projects to support.
BOINC handles VC-specific issues such as device
churn, failures, device heterogeneity, and verifying re-
sult correctness.
500,000 devices are actively participating in VC
projects, mostly via BOINC. These devices have
about 2.3 million CPU cores and 290,000 GPUs,
and collectively provide an average throughput of
93 PetaFLOPS. The computers are primarily mod-
ern, high-end machines: they average 16.5 CPU Gi-
gaFLOPS and 11.4 GB of RAM.
The cost of volunteer computing is paid primarily
by volunteers, who buy hardware and electricity. Sci-
entists have only to maintain a server. In most cases,
their cost is less than 1% of equivalent computing
power on a commercial cloud.
Despite VC’s tremendous capacity and econ-
omy, it is currently used by relatively few scien-
tists. There are a dozen or so major projects,
such as SETI@home [19], Einstein@home [2], Fold-
ing@home [13], IBM World Community Grid [10],
Rosetta@home [5], GPUgrid, and ClimatePredic-
tion.net. These are generally large research projects
with long-term continuous computing needs.
For most scientists, however, creating a BOINC
project is not feasible. It requires a range of techni-
cal skills; it’s not suited to sporadic computing needs;
and it requires a significant initial investment with no
guarantee of computing power.
To solve these problems, and to make the power
of VC available to all scientists, we are taking a
new approach: adding BOINC-based VC back ends
to existing computing providers such as high perfor-
mance computing (HPC) centers and science gate-
ways through a project we are calling VAST. This
infrastructure will be operated by staff, not by sci-
entists; in fact, scientists won’t be involved; they’ll
just see higher throughput and lower queueing de-
lays. Initially, we are developing VAST prototypes
at the Texas Advanced Computing Center (TACC)
and at nanoHUB, a popular gateway for nanoscien-
tists. The longer-term goal of the VAST project is
to make this infrastructure easy to replicate at other
HPC providers, and easy to integrate into additional
science gateways.
The larger vision of this project is to fully inte-
1
grate data-center based research cyberinfrastructure
with volunteer computing resources in a way that al-
lows for seamless use of HPC/HTC clusters, cloud
resources, and volunteer systems for researchers from
a single interface. In addition, VAST will enable in-
tegration with various job-processing APIs and sci-
entific gateway systems, enabling the largest possible
set of researchers to access the largest possible set of
available computing resources.
2 Background
VAST extends two existing software tools to en-
able seamless integration of volunteer systems and
research cyberinfrastructure. BOINC provides the
client-side mechanisms for enabling volunteer com-
puting, as well as the server-side infrastructure for
scheduling work units. Launcher is a tool for simpli-
fying execution of sequential jobs on HPC systems.
We provide background on both of these tools below.
2.1 Volunteer Computing with
BOINC
2.1.1 Remote Job Submission
BOINC provides flexible mechanisms for moving jobs
from HTC systems to a BOINC project on a remote
server. These were originally developed for an HT-
Condor [23] interface and have been extended for this
project. These consist of HTTP/XML Remote Pro-
cedure Calls (RPCs) for submitting, monitoring and
controlling jobs, and for managing input and output
files. BOINC supplies bindings in C++, Python, and
PHP for these RPCs. BOINC servers must be pub-
licly accessible via HTTP. Input and output files must
be moved between these servers and the computing
provider’s existing non-public servers. The process of
running a batch of jobs on BOINC thus consists of
these steps:
Collect the set of input files and applications
files used by the jobs in the batch. Make an
RPC to BOINC asking which of these files it al-
ready has. Make a second RPC to upload absent
files. These RPCs create an association between
the batch and the files, stored in a relational
database. Files can safely be deleted from the
BOINC server when there are no associations to
active batches.
Make an RPC submitting the jobs. Each job is
described by an XML element specifying its ap-
plications, input files, and requirements for re-
sources such as RAM, storage, and CPU. These
let the BOINC server send the jobs to volunteer
PCs capable of handling them.
Make RPCs monitoring the status of the batch.
When complete, get the output files of the batch.
These are typically transferred as a single zip file.
Make an RPC “retiring” the batch. This re-
moves associations to input files, and deletes out-
put files. Job submitters are identified by ac-
counts on the BOINC projects, and all opera-
tions must be authenticated with account cre-
dentials.
BOINC also provides mechanisms for access con-
trol (limiting what applications a particular user
can submit jobs to) and resource allocation (divid-
ing computing power among competing users fairly,
based on assigned resource shares ).
2.1.2 Running Linux Applications on Home
PCs
The set of computers currently running BOINC is
86% Windows, 7% Mac OS X, and 7% Linux [3].
Originally it was necessary to build and test ver-
sions of science applications for each platform. More
recently, BOINC added the capability of using vir-
tual machine technology [20] (specifically, Virtual-
Box [27]) to run Linux applications on Windows and
Mac computers (see Figure 1(a)). This works as fol-
lows:
The default BOINC installer includes Virtual-
Box.
BOINC provides a program called vboxwrapper
that interfaces the BOINC client to a Virtual-
Box VM. Compiled versions of vboxwrapper are
available for the major platforms.
Scientists create a VirtualBox VM image for
their environment of choice (OS version, li-
braries, etc.) and create BOINC applications
consisting of vboxwrapper, the VM image, and
the science application to run within the VM.
We are using Docker [15] to create these virtual en-
vironments (Figure 1(b)). The VM image shipped to
clients is a minimal system able to run Docker images;
it’s small (30MB). At TACC, we add Docker layers to
create the default Stampede node environment, and
add additional layers to add support for specific appli-
cations, as well as the applications themselves. These
layers are separate files which are cached on client
computers, so data transfer is minimized. Other HTC
2
(a) Virtual Machine-based application
(b) Docker-based application
Figure 1: BOINC Client execution modes for non-
native applications
Service Providers will be be provided instructions on
how to create their own Docker layers to emulate their
computing environment so jobs can be offloaded from
their systems as well. Using this approach, it is possi-
ble to run most or all a typical computing provider’s
HTC applications with little to no additional setup
work.
2.2 Launcher
Launcher [31, 30, 28, 29] is a Bourne shell-based,
lightweight, scalable framework for bundling large
numbers of sequential, single-process, and single-node
applications for submission to HPC clusters whose
scheduling policies are preferential to long-running
parallel jobs. Launcher has been in use at TACC for
more than a decade, and is being used by research
groups across the globe to further increase their com-
putational productivity.
2.2.1 Support for Manycore Architectures
Launcher has support for both 1st and 2nd gener-
ation Intel Xeon Phi (cards and self-hosted, respec-
tively). To do this Launcher interfaces with the soft-
ware package hwloc [6] to detect the logical core lay-
out of the host system, and then binds logical core
(a) DrugDiscovery@TACC
(b) VDJ Portal
Figure 2: Science Gateways enabled by Launcher
ranges to particular Launcher tasks. Launcher can
also detect and either ignore or make use of hyper-
threads/contexts.
2.2.2 Existing Integration with Scientific
Gateways
Existing frameworks such as the Agave API [8] in-
corporate the Launcher to enable parametric func-
tionality. This allows scientific gateways such
as DrugDiscovery@TACC [25](Figure 2(a)), VDJ
Server [24](Figure 2(b)), and CyVerse [14, 9] to ex-
ecute hundreds of thousands of small analysis jobs
on cyberinfrastructure such as Stampede at TACC
without the user being required to learn Linux or
command-line tools. This capability has enabled
thousands of researchers who may have otherwise
been unwilling to transition from personal-scale com-
puting to national research cyberinfrastructure to in-
crease their available computational capacity and po-
tentially increase their scientific output and impact.
3 VAST Vision
Currently, tools like Launcher provide users with a
convenient mechanism for executing sequential and
single-process applications on large-scale cyberin-
frastructure. Unfortunately, the demand for high
throughput computing continues to outpace growth
3
of available resources in the national research cyber-
infrastructure.
Because of this widening gap between demand and
capability, we are exploring the use of volunteer cy-
cles in order to help make up the deficiency. In
the past, use of volunteer computing required a con-
certed effort on the part of the research group to
develop applications, run server-side scheduling and
data movement services, and advertise their project
in order to attract volunteers. In many situations,
this was a beneficial investment on the part of the
research group. However, not all investigators have
the resources to devote to such a large investment
with potentially limited results. For many research
groups, especially groups who are only now moving
to national-scale research cyberinfrastructure, a bet-
ter approach is needed.
The larger vision of this project is to fully integrate
datacenter located research cyberinfrastructure with
volunteer computing resources in a way that allows
for seamless use of HPC/HTC clusters and MPPs,
cloud resources, and volunteer systems for researchers
from a single interface. In addition, VAST will enable
integration with various science APIs and scientific
gateways.
3.1 Back-end Platform Integration
HTC jobs do not typically have particular hardware
requirements that necessitate binding to a particu-
lar resource. As such, there is a huge potential for
optimizing job placement and execution. Jobs which
do not require parallel file systems and high speed
interconnects can be moved automatically to more
appropriate platforms, such as academic clouds and
volunteer systems, while opening up cycles on large-
scale parallel systems for jobs which necessitate that
type of resource.
Figure 3 illustrates our vision for the VAST
project. The end goal would be to integrate many
traditional HPC resources with VAST (which may
consist of both citizen volunteers and cloud-based vol-
unteers), using a global filesystem or object store as
a means of coordinating input files and results.
Using Containers to Simplify Platform Migra-
tion In order to facilitate easy migration between
HPC platforms (such as Stampede2 [21]), academic
clouds (such as Jetstream [22]), and volunteer PCs,
user applications will be packaged within contain-
ers which can be executed across various platforms.
While BOINC has already used Docker [15] (see Sec-
tion 2.1.2) to achieve this type of flexibility, many
CI providers have reservations about using Docker
on shared production systems, mostly concerning the
requirement of elevated permissions to execute the
container infrastructure [7].
Because of this, we are is also looking to user-space
container solutions for the VAST project, such as Sin-
gularity [12] and Shifter [11], as a means of provid-
ing the capability of automatic cross-platform migra-
tion. Singularity will be the first target, as many
of the appropriate applications for VAST have al-
ready been containerized using Singularity at TACC.
In fact, Singularity containers can currently be run
on Stampede2, Lonestar5 [18], Hikari, and Jetstream
at TACC.
3.2 VAST API
One of the long-term goals of the VAST project is to
provide a set of public application programming in-
terfaces (APIs) which encapsulate all of capabilities
of VAST. By way of the VAST API, a gateway devel-
oper would be able to develop a submission mecha-
nism for BOINC job batches, check the status of those
batches, recover the data, and query usage statistics.
In the same way, developers of tools such as
Launcher or resource management systems, such as
SLURM [32], would be able to use the VAST API to
integrate VAST with their systems as a potential re-
placement of supplement to tools such as job arrays.
3.3 Matchmaking Jobs and Volun-
teers
In the original BOINC model, each BOINC project
serves a single research group, and volunteers are ex-
pected to browse the available projects and choose
those whose research goals they support most. This
model doesn’t work well in the context of VAST, be-
cause cyberinfrastructure resource providers process
jobs in a wide range of science areas.
To address this problem, we are creating a new
framework for volunteering, in which volunteers reg-
ister at a central web site (call it vast.ci) and express
their computing preferences in terms of science goals
rather than projects.
Vast.ci serves as a “meta-scheduler.” Each volun-
teer computer periodically contacts vast.ci and re-
ceives a list of projects to attach to. Vast.ci main-
tains a list of vetted BOINC projects (including ex-
isting projects as well as the new ones at TACC and
nanoHUB). This list, and the way in which comput-
ing power is divided among the projects, is managed
centrally (see below). In particular, a new pro ject
can be given computing power without having to do
its own PR and volunteer recruitment.
4
Figure 3: The VAST Cyberinfrastructure Vision
We have defined a hierarchy of keywords to de-
scribe science areas (e.g. biomedicine, disease re-
search, cancer research) and project locations (e.g.
Europe, United Kingdom, Oxford University). The
set of keywords, the hierarchy, and the keyword text
are mutable, but each keyword has an associated per-
manent integer ID.
Using a web interface at vast.ci, volunteers can as-
sign yes/ no/ maybe preferences to keywords. The
default is “maybe”. “No” means the volunteer is
unwilling to process any jobs having that keyword.
“Yes” means that the volunteer prefers jobs having
that keyword.
Jobs submitted via VAST APIs at TACC and
nanoHUB are associated with a set of keywords. The
BOINC scheduler uses keywords, together with user
preferences, to decide what jobs to send in response
to a work request.
In addition, projects can have associated keywords,
and vast.ci takes these into account in deciding which
projects to assign to a particular volunteer.
As a meta-scheduler, vast.ci allocates computing
power among projects. This is based on a “linear
bounded” model that efficiently supports both con-
tinuous and sporadic computing requirements. Each
project has a parameter that determines its resource
under continuous load. These shares will be deter-
mined by an XSEDE-type allocation policy. In con-
junction with a similar model for division of resources
among job submitters within a project, this allows for
a form of guaranteed quality-of-service (QoS) to in-
dividual scientists.
4 Herd: A VAST Resource at
TACC
Sequential execution on TACC resources such as
Stampede has been increasing steadily over the last
several years. In 2016 high throughput jobs were
run through TACC’s Launcher framework by more
than 700 of the nearly 4,000 active users on Stam-
pede (˜
17.5%, see Figure 4) based on data collected
from the XALT job analysis utility [1].
This trend is expected to continue as scientific
communities whose problems tend to be more data-
oriented than compute-oriented look to national cy-
berinfrastructure to continue pushing the boundaries
of their research. This Big Data problem is often
solved with large-scale HTC workloads which con-
sist of many hundreds or thousands of sequential and
single-process applications being run to postprocess
or analyze disparate data components.
In order to limit the impact of these ever-growing
small-scale job requests on batch-scheduled HPC sys-
tems, TACC uses Launcher to bundle sequential and
single-process tasks into a single parallel job, which is
then scheduled on TACC’s batch-scheduled systems
as though it were a large-scale parallel simulation.
4.1 Candidate Selection
The first challenge with this model is the need to
accurately and automatically identify candidates for
migration to volunteer resources. This is currently
done through a separate submission tool called which
parses a Launcher job file, and determines if each
5
Figure 4: Percentage of users on TACC’s Stampede
system who executed Launcher jobs in 2016
Figure 5: Current Herd Setup at TACC
job in the file meets the appropriate criteria (for a
full description Launcher terminology and methods,
see [30]). Candidates for migration should:
Have no non-standard dynamic library depen-
dencies in the executable,
Not use configuration files which might within
contain reference to other input files which re-
quire deep or hierarchical analysis,
Not require network connectivity in order to
properly execute. This also includes use of dis-
tributed memory parallelism with MPI [26], and
Not require additional processes, daemons, or
services outside of the candidate executable.
4.2 Determining Disk Requirements
If a job is determined to be a candidate according to
these criteria, the system will then determine whether
or not the job’s disk space requirements are appropri-
ate for migration to VC resources.
The total disk requirement for the job is the size of
the executable, plus the size of all input files, plus the
estimated size of all output files. In many cases, it is
difficult to know how much space will be required by
output files. Currently, we estimate that the output
files will be roughly equivalent in size to the input files
by assuming a relatively verbose input file format,
such as FASTA [17].
4.3 Reintegrating Results
Once migrated jobs have been completed, the out-
put files are transferred back to the BOINC project
server from the volunteers (see Figure 5). At this
point, there are two potential options for rejoining
the migrated result files with the result files of unmi-
grated jobs:
Server-side actions to either automatically move
result files to the appropriate location on a global
filesystem, or
Client-side actions to pull data off of the BOINC
project server and back to the appropriate loca-
tion on a global filesystem.
At TACC, we have chosen to use a client-side ap-
proach for two reasons. First, our BOINC project
server does not have a means of natively mounting
the Lustre filesystem. Second, TACC’s two-factor au-
thentication requirement means that any server-side
operation would necessitate either open permissions
on a user’s target directory, or root-level access to the
6
filesystem in order to drop files. Neither of these were
deemed to be appropriate.
In order to perform the client-side reintegration
process, a digest file containing the batch id of the
migrated jobs, and the directory in which the submis-
sion occurred, are maintained in the user’s HOME
directory. Upon login, a startup script queries the
BOINC project server to determine if the batch is
completed, and if so migrates the data back to the ap-
propriate directory and removes the batch line from
the digest file.
5 Going Forward
We expect that the experience and technology that
come out of this project will encourage and facilitate
the use of volunteer computing by other computing
providers in the national cyberinfrastructure.
Improving Automatic Migration Capabilities
We will be improving both candidate detection and
disk requirements estimation, in order to increase the
total potential impact of VC on TACC user’s work-
loads and to allow for better job packing on VC re-
sources (where maximum potential disk usage is a
limiting factor). We are investigating other methods
for quickly making fast predictions for these features.
Expanding Application Support We will be ex-
panding the number of candidate applications by
building supporting Docker containers which mirror
the modules available on TACC systems. In this way,
we expect to be able to migrate executables which
contain more complex dynamic library dependencies
by mimicking the module environment on the TACC
system on VC resources.
Expanded Storage Capabilities We will
be investigating alternatives for improving data
ingress/egress to/from volunteers by exploring use
of cloud-based file and object storage, as well as the
necessary authentication/authorization requirements
to allow volunteers to access researcher-allocated
storage in order to reduce the I/O burden on
filesytem infrastructure at resource providers.
Collaboration with SGCI This paper has fo-
cused on the TACC activity. Using a similar
container-based approach, we are adding a VC back
end to the nanoHUB nanoscience gateway. These ca-
pabilities will be integrated in the HUBzero software,
which is used by about 20 other science gateways. We
plan to work with the Science Gateways Community
Institute (SGCI) to provide training and support for
adding VC to gateways where it is needed, and for
integrating VC capabilities with other gateway soft-
ware systems.
6 Conclusion
This project is currently in the early stages of de-
velopment, and much more work remains to be done.
However, we hope that this project demonstrates that
it is possible to integrate, in a fairly seamless way,
volunteer computing into traditional research cyber-
infrastructure. In order to do so, we have developed
new software capabilities for both BOINC and TACC
resources, which help to ease the burden on users of
traditional CI resources. Going forward, we intend
to improve these capabilities and provide new mech-
anisms for recruiting volunteers, so that the national
research CI has a ready pool of volunteer resources
with which to complement their existing computa-
tional capacity.
7 Acknowledgements
This work is supported by NSF grants ACI-1550601,
ACI-1550350, ACI-1664022, and ACI-1664190.
References
[1] K. Agrawal, M. R. Fahey, R. McLay, and
D. James. User environment tracking and prob-
lem detection with xalt. In Proceedings of the
First International Workshop on HPC User Sup-
port Tools, HUST ’14, pages 32–40, Piscataway,
NJ, USA, 2014. IEEE Press.
[2] B. Allen. Einstein at home. 2005.
[3] D. P. Anderson and K. Reed. Celebrating di-
versity in volunteer computing. In 2009 42nd
Hawaii International Conference on System Sci-
ences, pages 1–8, Jan 2009.
[4] D. P. B. Anderson. A system for public-
resource computing and storage,” 5th ieee. In
ACM International Workshop on Grid Comput-
ing, Pittsburgh, PA (November 8, 2004). See also
http://boinc. berkeley. edu, 2004.
[5] D. Baker. Rosetta@home.
http://boinc.bakerlab.org/rosetta/.
[6] F. Broquedis, J. Clet-Ortega, S. Moreaud,
N. Furmento, B. Goglin, G. Mercier, S. Thibault,
7
and R. Namyst. hwloc: A generic framework
for managing hardware affinities in hpc appli-
cations. In 2010 18th Euromicro Conference on
Parallel, Distributed and Network-based Process-
ing, pages 180–186, Feb 2010.
[7] T. Combe, A. Martin, and R. D. Pietro. To
docker or not to docker: A security perspective.
IEEE Cloud Computing, 3(5):54–62, Sept 2016.
[8] R. Dooley and J. Stubbs. Dynamically provision-
ing portable gateway infrastructure using docker
and agave. In Proceedings of the 2014 Annual
Conference on Extreme Science and Engineer-
ing Discovery Environment, XSEDE ’14, pages
55:1–55:2, New York, NY, USA, 2014. ACM.
[9] S. Goff, M. Vaughn, S. McKay, E. Lyons,
A. Stapleton, D. Gessler, N. Matasci, L. Wang,
M. Hanlon, A. Lenards, A. Muir, N. Merchant,
S. Lowry, S. Mock, M. Helmke, A. Kubach,
M. Narro, N. Hopkins, D. Micklos, U. Hilgert,
M. Gonzales, C. Jordan, E. Skidmore, R. Doo-
ley, J. Cazes, R. McLay, Z. Lu, S. Pasternak,
L. Koesterke, W. Piel, R. Grene, C. Noutsos,
K. Gendler, X. Feng, C. Tang, M. Lent, S.-
j. Kim, K. Kvilekval, B. Manjunath, V. Tan-
nen, A. Stamatakis, M. Sanderson, S. Welch,
K. Cranston, P. Soltis, D. Soltis, B. O’Meara,
C. Ane, T. Brutnell, D. Kleibenstein, J. White,
J. Leebens-Mack, M. Donoghue, E. Spalding,
T. Vision, C. Myers, D. Lowenthal, B. Enquist,
B. Boyle, A. Akoglu, G. Andrews, S. Ram,
D. Ware, L. Stein, and D. Stanzione. The iplant
collaborative: Cyberinfrastructure for plant bi-
ology. Frontiers in Plant Science, 2:34, 2011.
[10] IBM. World community grid.
https://www.worldcommunitygrid.org/.
[11] D. Jacobsen and R. Canon. Shifter: Contain-
ers for hpc. In Cray Users Group Conference
(CUG’16), 2016.
[12] G. M. Kurtzer, V. Sochat, and M. W. Bauer.
Singularity: Scientific containers for mobility of
compute. PLOS ONE, 12(5):1–20, 05 2017.
[13] S. M. Larson, C. D. Snow, M. Shirts, and V. S.
Pande. Folding@ home and genome@ home: Us-
ing distributed computing to tackle previously
intractable problems in computational biology.
arXiv preprint arXiv:0901.0866, 2009.
[14] N. Merchant, E. Lyons, S. Goff, M. Vaughn,
D. Ware, D. Micklos, and P. Antin. The iplant
collaborative: Cyberinfrastructure for enabling
data to discovery for the life sciences. PLOS Bi-
ology, 14(1):1–9, 01 2016.
[15] D. Merkel. Docker: Lightweight linux contain-
ers for consistent development and deployment.
Linux J., 2014(239), Mar. 2014.
[16] O. Nov, O. Arazy, and D. Anderson. Dusting for
science: Motivation and participation of digital
citizen science volunteers. In Proceedings of the
2011 iConference, iConference ’11, pages 68–74,
New York, NY, USA, 2011. ACM.
[17] W. R. Pearson. Using the FASTA Program to
Search Protein and DNA Sequence Databases,
pages 307–331. Humana Press, Totowa, NJ,
1994.
[18] C. Proctor, D. Gignac, R. McLay, S. Liu,
D. James, T. Minyard, and D. Stanzione. Lon-
estar 5: Customizing the cray xc40 software en-
vironment.
[19] M. Shirts and V. S. Pande. Screen savers of
the world unite! Science, 290(5498):1903–1904,
2000.
[20] J. E. Smith and R. Nair. The architecture of
virtual machines. Computer, 38(5):32–38, May
2005.
[21] D. Stanzione, B. Barth, N. Gaffney, K. Gaither,
C. Hempel, T. Minyard, S. Mehringer, E. Wern-
ert, H. Tufo, D. Panda, and P. Teller. Stampede
2: The evolution of an xsede supercomputer. In
Proceedings of the Practice and Experience in
Advanced Research Computing 2017 on Sustain-
ability, Success and Impact, PEARC17, pages
15:1–15:8, New York, NY, USA, 2017. ACM.
[22] C. A. Stewart, T. M. Cockerill, I. Foster, D. Han-
cock, N. Merchant, E. Skidmore, D. Stanzione,
J. Taylor, S. Tuecke, G. Turner, M. Vaughn, and
N. I. Gaffney. Jetstream: A self-provisioned,
scalable science and engineering cloud environ-
ment. In Proceedings of the 2015 XSEDE Con-
ference: Scientific Advancements Enabled by En-
hanced Cyberinfrastructure, XSEDE ’15, pages
29:1–29:8, New York, NY, USA, 2015. ACM.
[23] D. Thain, T. Tannenbaum, and M. Livny. Dis-
tributed computing in practice: the condor expe-
rience. Concurrency and Computation: Practice
and Experience, 17(2-4):323–356, 2005.
8
[24] I. Toby, S. Christley, W. Scarborough, W. H.
Rounds, J. Fonner, S. Mock, N. Monson, R. H.
Scheuermann, and L. G. Cowell. Vdjserver – a
web-accessible analysis portal for immune reper-
toire sequencing analysis. The Journal of Im-
munology, 198(1 Supplement):55.49–55.49, 2017.
[25] U. Viswanathan, S. M. Tomlinson, J. M. Fon-
ner, S. A. Mock, and S. J. Watowich. Identifica-
tion of a novel inhibitor of dengue virus protease
through use of a virtual screening drug discov-
ery web portal. Journal of Chemical Information
and Modeling, 54(10):2816–2825, 2014. PMID:
25263519.
[26] D. W. Walker. The design of a standard message
passing interface for distributed memory concur-
rent computers. Parallel Computing, 20(4):657
– 673, 1994.
[27] J. Watson. Virtualbox: Bits and bytes mas-
querading as machines. Linux J., 2008(166), Feb.
2008.
[28] L. A. Wilson. Using Managed High Performance
Computing Systems for High-Throughput Com-
puting, pages 61–79. Springer International Pub-
lishing, Cham, 2016.
[29] L. A. Wilson. Using the launcher for execut-
ing high throughput workloads. Big Data Re-
search, 8:57 – 64, 2017. Tutorials on Tools and
Methods using High Performance Computing re-
sources for Big Data.
[30] L. A. Wilson and J. M. Fonner. Launcher: A
Shell-based Framework for Rapid Development
of Parallel Parametric Studies. In Proceedings
of the 2014 Annual Conference on Extreme Sci-
ence and Engineering Discovery Environment,
XSEDE ’14, pages 40:1–40:8, New York, NY,
USA, 2014. ACM.
[31] L. A. Wilson, J. M. Fonner, J. Allison, O. Es-
teban, H. Kenya, and M. Lerner. Launcher: A
simple tool for executing high throughput com-
puting workloads. The Journal of Open Source
Software, 2(16), aug 2017.
[32] A. B. Yoo, M. A. Jette, and M. Grondona.
SLURM: Simple Linux Utility for Resource
Management, pages 44–60. Springer Berlin Hei-
delberg, Berlin, Heidelberg, 2003.
9
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Article
Full-text available
The need for ever-shorter development cycles, continuous delivery, and cost savings in cloud-based infrastructures led to the rise of containers, which are more flexible than virtual machines and provide near-native performance. Among all container solutions, Docker, a complete packaging and software delivery tool, currently leads the market. This article gives an overview of the container ecosystem and discusses the Docker environment's security implications through realistic use cases. The authors define an adversary model, point out several vulnerabilities affecting current Docker usage, and discuss further research directions.
Article
Full-text available
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Conference Paper
The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches the end of its production life, it is being replaced in phases by a new supercomputer, Stampede 2, that will not only take up much of the original system's workload, but continue the bridge to technologies on the path to exascale computing. This paper provides a brief summary of the experiences of Stampede 1, and details the design and architecture of Stampede 2. Early results are presented from a subset of Intel Knights Landing nodes that are bridging between the two systems.
Article
For many scientific disciplines, the transition to using advanced cyberinfrastructure comes not out of a desire to use the most advanced or most powerful resources available, but because their current operational model is no longer sufficient to meet their computational needs. Many researchers begin their computations on their desktop or local workstation, only to discover that the time required to simulate their problem, analyze their instrument data, or score the multitude of entities that they want to would require far more time than they have available. Launcher is a simple utility which enables the execution of high throughput computing workloads on managed HPC systems quickly and with as little effort as possible on the part of the user. Basic usage of the Launcher is straightforward , but Launcher provides several more advanced capabilities including use of Intel Xeon Phi coprocessor cards and task binding support for multi-/many-core architectures. We step through the processes of setting up a basic Launcher job, including creating a job file, setting appropriate environment variables , and using scheduler integration. We also describe how to enable use of the Intel Xeon Phi coprocessor cards, take advantage of Launcher's task binding system, and execute many parallel (OpenMP/MPI) applications at once.
Conference Paper
Container-based computed is rapidly changing the way software is developed, tested, and deployed. This paper builds on previously presented work on a prototype framework for running containers on HPC platforms. We will present a detailed overview of the design and implementation of Shifter, which in partnership with Cray has extended on the early prototype concepts and is now in production at NERSC. Shifter enables end users to execute containers using images constructed from various methods including the popular Docker-based ecosystem. We will discuss some of the improvements over the initial prototype including an improved image manager, integration with SLURM, integration with the burst buffer, and user controllable volume mounts. In addition, we will discuss lessons learned, performance results, and real-world use cases of Shifter in action. We will also discuss the potential role of containers in scientific and technical computing including how they complement the scientific process. We will conclude with a discussion about the future directions of Shifter.
Conference Paper
This work enhances our understanding of individual users' software needs, then leverages that understanding to help stakeholders conduct business in a more efficient, effective, and systematic way. The product, XALT, builds on work that is already improving the user experience and enhancing support programs for thousands of users on twelve supercomputers across the United States and Europe. XALT will instrument individual jobs on high-end computers to generate a picture of the compilers, libraries, and other software that users need to run their jobs successfully. It will highlight the products our researchers need and do not need, and alert users and support staff to the root causes of software configuration issues as soon as the problems occur. A key objective of this work is generating the information needed to improve efficiency and effectiveness for an extensive community of stakeholders including users, sponsoring institutions, support organizations, and development teams. Efficiency, effectiveness, and responsible stewardship each require a clear picture of users' needs. XALT is an important step in the quest to achieve that clarity.
Chapter
This chapter will explore the issue of executing High-Throughput Computing (HTC) workflows on managed High Performance Computing (HPC) systems that have been tailored for the execution of “traditional” HPC applications. We will first define data-oriented workflows and HTC, and then highlight some of the common hurdles that exist to executing these workflows on shared HPC resources. Then we will look at Launcher, which is a tool for making large HTC workflows appear—from the HPC system’s perspective—to be a “traditional” simulation workflow. Launcher’s various features are described, including scheduling modes and extensions for use with Intel®;Xeon PhiTM coprocessor cards.
Article
We report the discovery of a novel small molecule inhibitor of the dengue virus (DENV) protease (NS2B-NS3pro) using a newly constructed web-based portal (DrugDiscovery@TACC) for structure-based virtual screening. Our drug discovery portal, an extension of virtual screening studies performed using IBM's World Community Grid, facilitated access to supercomputer resources managed by the Texas Advanced Computing Center (TACC) and enabled drug-like commercially available small molecule libraries to be rapidly screened against several high resolution DENV NS2B-NS3pro crystallographic structures. Detailed analysis of virtual screening docking scores and hydrogen bonding interactions between each docked ligand and the NS2B-NS3pro Ser135 side chain were used to select molecules for experimental validation. Compounds were ordered from established chemical companies and compounds with established aqueous solubility were tested for ability to inhibit DENV NS2B-NS3pro cleavage of a model substrate in kinetic studies. As proof-of-concept, we validated a small molecule dihydro-naphthalenone hit as a single digit micromolar mixed noncompetitive inhibitor of the DENV protease. Since the dihydro-naphthalenone was predicted to interact with NS2B-NS3pro residues that are largely conserved between DENV and the related West Nile virus (WNV), we tested this inhibitor against WNV NS2B-NS3pro and observed a similar mixed noncompetitive inhibition mechanism. However, the inhibition constants were ~10-fold larger against the WNV protease relative to the DENV protease. This novel validated lead had no chemical features or pharmacophores associated with adverse toxicity, carcinogenicity, or mutagenicity risks and thus is attractive for additional characterization and optimization.