Conference Paper

Nomadic Migration: A New Tool for Dynamic Grid Computing.

Albert-Einstein-Inst., Max-Planck-Inst. fur Gravitationsphys.
DOI: 10.1109/HPDC.2001.945211 Conference: 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 7-9 August 2001, San Francisco, CA, USA
Source: DBLP

ABSTRACT We describe the design and implementation of a technology which provides an application with the ability to seek out and exploit remote computing resources by migrating tasks from site to site, dynamically adapting the application to a changing Grid environment. The motivation for this migration framework, dubbed "The Worm", originated from the experience of having an abundance of computing time for simulations, which is distributed over multiple sites and split in time chunks by queuing systems. We describe the architecture of the Worm, describing how new or more suitable resources are located, and how the payload simulation is migrated to these resources following a trigger event. The migration technology presented here is designed to be used for any application, including large-scale HPC simulations

Download full-text

Full-text

Available from: Harry Edward Seidel, Aug 28, 2015
0 Followers
 · 
77 Views
  • Source
    • "This paper seeks to provide for a system true to the peer-to-peer architecture, where every member of the network can both offer resources and utilize those of other members. Lanfermann et al. describe work which meets these criteria [9], [10]. The system they discuss allows executing entities to migrate from one machine to another, as more desirable resources become available. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Resource discovery is the process of locating shared resources on a computer network. Previously studied examples include efficiently finding files with a given title on a file sharing system. New developments in the application of networked computers raise the issue of dynamic resource discovery, the process of locating shared resources that are always changing. An example application is peer-to-peer computing, where a user wishes to locate idle CPU time anywhere on the network. Peer-to-peer computing is an exciting new computing paradigm. There are vast amounts of idle CPU resources scattered through the globe. We envision a peer-to-peer system to harness those resources, where every member of the network can both share their own CPU and utilize others' CPUs. In a network of hundreds of thousands of computers, resource discovery will play an important role. To avoid debilitating amounts of excess network traffic it is imperative that an efficient resource discovery algorithm be chosen. This paper's contribution to this topic is the use of gossip to reduce network traffic without sacrificing effectiveness. This project has investigated piggybacking gossip messages on other communications to increase the intelligence of searching protocols. The overhead of piggybacking the small amount of data needed is very small, and a case study by simulation shows that it can reduce network traffic by 71-84 percent
    Parallel Processing, 2006. ICPP 2006. International Conference on; 09/2006
  • Source
    • "This paper seeks to provide for a system true to the peer-to-peer architecture, where every member of the network can both offer resources and utilize those of other members. Lanfermann et al. describe work which meets these criteria [9], [10]. The system they discuss allows executing entities to migrate from one machine to another, as more desirable resources become available. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Thesis (M.S.)--Michigan Technological University, 2006. Includes bibliographical references.
  • Source
    • "In addition, the ability to migrate running jobs to more suitable resources based on events dynamically generated by both the Grid and the running applications (adaptive execution ), can also improve the performance and fault tolerance obtained by applications on a Grid [9]. The support for adaptive execution of the GridWay framework is discussed in Section 4; and then, in Section 5, we describe the GridWay facilities to provide job execution with fault tolerance . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Grids offer a dramatic increase in the number of available compute and storage resources that can be delivered to applications. This new computational infrastructure provides a promising platform to execute loosely coupled, high-throughput parameter sweep applications. This kind of applications arises naturally in many scientific and engineering fields like bioinformatics, computational fluid dynamics (CFD), particle physics, etc. The efficient execution and scheduling of parameter sweep applications is challenging because of the dynamic and heterogeneous nature of grids. We present a scheduling algorithm built on top of the GridWay framework that combines: (i) adaptive scheduling to reflect the dynamic grid characteristics; (ii) adaptive execution to migrate running jobs to better resources and provide fault tolerance; (iii) re-use of common files between tasks to reduce the file transfer overhead. The efficiency of the approach is demonstrated in the execution of a CFD application on a highly heterogeneous research testbed.
    12th Euromicro Workshop on Parallel, Distributed and Network-Based Processing (PDP 2004), 11-13 February 2004, A Coruna, Spain; 01/2004
Show more