Nomadic Migration: A New Tool for Dynamic Grid Computing.
ABSTRACT We describe the design and implementation of a technology which provides an application with the ability to seek out and exploit remote computing resources by migrating tasks from site to site, dynamically adapting the application to a changing Grid environment. The motivation for this migration framework, dubbed "The Worm", originated from the experience of having an abundance of computing time for simulations, which is distributed over multiple sites and split in time chunks by queuing systems. We describe the architecture of the Worm, describing how new or more suitable resources are located, and how the payload simulation is migrated to these resources following a trigger event. The migration technology presented here is designed to be used for any application, including large-scale HPC simulations
- SourceAvailable from: ArXiv[Show abstract] [Hide abstract]
ABSTRACT: Computational Science on large high performance computing resources is hampered by the complexity of these systems. Much of this complexity is due to low-level details on these resources that are exposed to the application and the end user. This includes (but is not limited to) mechanisms for remote access, configuring and building applications from source code, and managing simulations and their output files via batch queue systems. These challenges multiply in a modern research environment, where a research collaboration spans multiple groups, often in loosely defined international collaborations, where there is a constant influx of new students into multi-year projects, and where simulations are performed on several different resources. The Simulation Factory addresses these challenges by significantly simplifying remote access, building executables, and managing simulations. By abstracting out the low-level differences between different resources, it offers a uniform interface to these resources. At the same time, it can enforce certain standards for performing simulations that encapsulate best practices from experienced users. Furthermore, SimFactory's automation avoids many possible user errors that can in the worst case render month-long simulations worthless. The Simulation Factory is freely available under an open source license.Grid Computing (GRID), 2010 11th IEEE/ACM International Conference on; 01/2010
- [Show abstract] [Hide abstract]
ABSTRACT: Since the late 1990s, we have witnessed an extraordinary development of Grid tech- nologies. Nowadays, dieren t Grid infrastructures are being deployed within the context of growing national and transnational research projects. However, the co- existence of those dieren t infrastructures opens an interesting debate about the coordinated harnessing of their resources, from the end-user perspective, and the si- multaneous sharing of resources, from the resource owner perspective. In this paper we demonstrate the ecien t and simultaneous use of dieren t Grid infrastructures through a decentralized and \end-to-end" scheduling and execution system. In par- ticular, we evaluate the coordinated use of the EGEE and IRISGrid testbeds in the execution of a Bioinformatics application. Results show the feasibility of building loosely-coupled Grid environments only based on Globus services, while obtaining non trivial levels of quality of service, in terms of performance and reliability. Such approach allows a straightforward resource sharing since the resources are accessed by using de facto standard protocols and interfaces.Journal of Parallel and Distributed Computing 01/2006; 66:763-771. · 1.12 Impact Factor
Conference Paper: Using Gossip for Dynamic Resource Discovery[Show abstract] [Hide abstract]
ABSTRACT: Resource discovery is the process of locating shared resources on a computer network. Previously studied examples include efficiently finding files with a given title on a file sharing system. New developments in the application of networked computers raise the issue of dynamic resource discovery, the process of locating shared resources that are always changing. An example application is peer-to-peer computing, where a user wishes to locate idle CPU time anywhere on the network. Peer-to-peer computing is an exciting new computing paradigm. There are vast amounts of idle CPU resources scattered through the globe. We envision a peer-to-peer system to harness those resources, where every member of the network can both share their own CPU and utilize others' CPUs. In a network of hundreds of thousands of computers, resource discovery will play an important role. To avoid debilitating amounts of excess network traffic it is imperative that an efficient resource discovery algorithm be chosen. This paper's contribution to this topic is the use of gossip to reduce network traffic without sacrificing effectiveness. This project has investigated piggybacking gossip messages on other communications to increase the intelligence of searching protocols. The overhead of piggybacking the small amount of data needed is very small, and a case study by simulation shows that it can reduce network traffic by 71-84 percentParallel Processing, 2006. ICPP 2006. International Conference on; 09/2006
Nomadic Migration: A New Tool for Dynamic Grid
Gerd Lanfermann, Gabrielle Allen, Thomas Radke, Edward Seidel
Abstract- We describe the design and implementation of
a technology which provides an application with the abil-
ity to seek out and exploit remote computing resources by
migrating tasks from site to site, dynamically adapting the
application to a changing Grid environment. The motiva-
tion for this migration framework, dubbed “The Worm”,
originated from the experience of having an abundance of
computing time for simulations, which is distributed over
multiple sites and split in time chunks by queuing systems.
We describe the architecture of the Worm, describing how
new or more suitable resources are located, and how the
payload simulation is migrated to these resources following
a trigger event. The migration technology presented here is
designed to be used for any application, including large-scale
RID computing involves utilizing computational re-
G sources, connected by networks, as needed to solve
problems. %cent advances in Grid computing are such
that applications are now in a position to begin to exploit
a wide range of available computer resources, simultane-
ously, sequentially, or both, enabling many different new
and innovative Grid usage scenarios.
Adding up the theoretically available computing time
across a pool of standard computers, such as idle work-
stations, or summing the total computing time granted to
a research group by several independent super computing
sites will typically yield an impressive capacity of process-
ing power. But these resources are neither available on a
homogenous architecture base nor are they all continuously
accessible over a long period of time.
Here we have focussed on a new type of Grid computing
appropriate for its dynamic character: self-determined mi-
gration of a simulation from one site or collection of sites 
to any other. We present a migration technology, dubbed
“the Worm”, designed for parallel applications with high
IO and memory requirements, driven by the need for per-
forming large scale simulations on HPC machines . The
technology is also applicable for the efficient use of idle
cycles from small machine pools.
A prototype implementation of the Worm was already
demonstrated at Supercomputing 2000, running across the
machines of the EGrid Testbed . The Worm was im-
plemented in the Cactus programming framework [l], .
The Worm’s payload, which describes the simulation code
G.Lanfermann, G.Allen, T.Radke and ESeidel are with the Max-
Planck-Institut fur Gravitationsphysik, Albert-Einstein-Institut,
Golm (AEI), E.Seide1 is also with the National Center for Super-
computing Applications, Champaign, IL, (NCSA)
0-7695-1296-8/01 $10.00 0 2001 IEEE
:: : :
. . ..
, , ..
: Data Acccrr
.. .. ..... .
Off-Sile Dnln Slornpe , ,
replicating Worm. The user payload is encapsulated by the Worm
Kernel, acting as a contact point to the resource detector and various
application information semrs (AIS). The transfer units stage exe-
cutables to machines and provides storage for checkpoint files during
We show the main components of a resource aware, self-
provided by scientists, was a simulation of a wave equa-
tion, although any real application written in the Cactus
framework can trivially be incorporated as a payload.
The participating EGrid sites  published characteristic
system profiles and load information to a central Resource
Manager. While the prototype Worm simulation was run-
ning on machines in the Grid, it was able to query this
central resource service, seeking available machines of a
certain configuration. Using this information it could then
migrate to another site according to some predefined crite-
111. WORM TECHNOLOGY
IN A DYNAMIC
Based on the early prototype experiences  a more sc-
phisticated Worm framework was desgined to provide the
user with range of migration policies and to overcome nu-
merous technical challenges when dealing with heteroge-
nous grid environments. With the new worm technology,
application migration can be inititated by a variety of cases,
which range from the user’s manual triggering of a migra-
tion event to fully automatic application relocation: by
monitoring the simulation performance and profiling the
current hardware with small benchmarking programms a
detailed profile of the current execution environment is gen-
Periodic lookups are performed to see if “better” re-
sources for this individual profile have become available
- in which case a migration to the more suited resource
may be initiated. Note that in this context ‘Lbetter” does
not necessarily mean “faster” but also “cheaper”, “more
storage”, “less queue waiting” or “better network”.
Since the Worm migrates between machines in a non-
predictable fashion, reacting to the dynamic nature of a
Grid, it requires a mechanism for tracking the current and
past locations of the Worm. This is handled with different
degrees of finesse, for example, by publishing the informa-
tion to a centralized Application Information Service (AIS)
or an email/SMS notification server.
Before migrating, the Worm must have located the next
resource by querying a remote Resource Broker (RB),
which tracks available computing resources, obtaining load
and other information from registered sites. Different RB
formats developed by e.g. GrADS 181 and groups within
the EGrid are understood. If a suitable resource cannot be
found, the simulation hibernates by writing a checkpoint
which is stored until appropriate machines become avail-
able to host the restarted simulation.
The Worm must provide the capability to access the dif-
ferent sites without user interaction to copy checkpoint and
parameter files, start processes and handle output data. It
is essential to provide a secure but easy way to interface
these resources. Our Worm supports methods as Globus
GSI technology [S] or more simply secure shell and secure
The Worm application (including the user provided pay-
load simulation) must be available on the different hetero-
geneous machines of the user’s virtual grid. We support
repositories of pre-built binaries and automated compiling
“on-the-fly” before execution. To restore the simulation
state in a heterogeneous machine environment, the check-
point files are coded in an architecture independent format.
We use HDF5 [ll] and the CactusCode framework [l] to
meet these requirements.
executed in Queue
Fig. 2. The timeline of migration events between three sites is shown.
The first two relocations involve hibernation of the application. In
the third case an advanced reservation scheduler is used to request
overlapping resources, which allows to directly stream checkpoints.
Automating and optimizing the usage of multiple re-
sources is an essential challenge to Grid-enabling applica-
tion software. We have described a technology that en-
ables an application on its own to seek out and exploit
computational resources on the Grid. This “Worm” ap-
proach not only provides applications with the ability to
make decisions about resource usage and to self-migrate to
new machines if necessary, it also takes into account the
heterogeneous nature of resources as well as their dynamic
availability in time.
Note that although we have spoken in terms of migrating
an entire application from site to site, future Grid appli-
cations will be able to take advantage of work flow par-
allelism: e.g. analysis tasks may be spawned off to other
grid resources. The Worm technology paves the ground for
these advanced and intelligent Grid applications. There are
many possible uses that will be discussed elsewhere [lo].
The development of the EGrid Worm is a highly collab-
orative effort,, and we are indebted to a great many experts
at different institutions, especially on the EGrid for their
advice and support. It is a pleasure for us to thank, above
all Tom Goodale and John Shalf, as well as Ian Foster,
Sridhar Gullapalli, Steve Fitzgerald and the Globus team
at ANL for their Globus and Data Grid work; Mike Folk
and his HDF5 development group at NCSA; Computing
resources and technical support have been provided by the
EGrid, AEI, NCSA, ANL, and ZIB. We have also benefit-
ted from close association with and partial support of the
ASC project, NSF PHY-9979985.
[l] Cactus Code: http://vw.cactuscode.org
 Allen, G., Goodale, T., Lanfermann, G., Seidel, E., Benger, W.,
Hege, H.-C., Merzky, A., Mass4 J., Radke, T. and Shalf, J.,
Solving Einstein’s Equation on Supemmputers, IEEE Computer,
p.52-59, December, 1999. http://vw.computer.org/computer/
 Seidel, E. and Suen, W.M., J. Comp. Appl. Math., 109,
 G. Allen, T. Dramlitsch, G. Lanfermann, E. Seidel, EBcient
Techniques for Distributed Computing submitted to HPDClO.
 G. Allen, W. Benger, T. Goodale, H. Hege, G. Lanfermann, A.
Merzky, T. Radke, E. Seidel, J. Shalf, “Cactus Tools for Grid
Applications”, to appear in Cluster Computing, (2001).
 Globus Metacomputing Toolkit: http://rvv.globus.org/
[7 The European Grid-Forum: http://uw.egrid.org
 Grid Application Development
http ://W. isi .edu/grads
 G. Allen, T. Dramlitsch, T. Goodale, G. Lanfermann, T. Radke,
E. Seidel, T. Kielmann, K. Verstoep, 2. Balaton, P. Kacsuk,
F. Szalai, J. Gehring, A. Keller, A. Streit, L. Matyska, M. Ruda,
A. Krenek, H. Frese, H. Knipp, A. Merzky, A. Reinefeld, F. Schin-
.tke, B. Ludwiczak, J. Nabrzyski, J. Pukacki, H.-P. Kersken, and
M. Russell, Early ezperiences with the Egrid testbed, in IEEE
International Symposium on Cluster Computing and the Grid,
[lo] G. AllenJ. Foster, T. Goodale, G. Lanfermann,T .
Radke,M. Russel1,E. Seidel, J. Shalf Grid Computing: An Ap-
plications Perspective (in preparation)
[Ill Hierarchical Data Format Version 5
http: //hdf .ncsa. uiuc. edu/HDF5