Conference Paper

Nomadic Migration: A New Tool for Dynamic Grid Computing.

Albert-Einstein-Inst., Max-Planck-Inst. fur Gravitationsphys.;
DOI: 10.1109/HPDC.2001.945211 Conference: 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 7-9 August 2001, San Francisco, CA, USA
Source: DBLP

ABSTRACT We describe the design and implementation of a technology which provides an application with the ability to seek out and exploit remote computing resources by migrating tasks from site to site, dynamically adapting the application to a changing Grid environment. The motivation for this migration framework, dubbed "The Worm", originated from the experience of having an abundance of computing time for simulations, which is distributed over multiple sites and split in time chunks by queuing systems. We describe the architecture of the Worm, describing how new or more suitable resources are located, and how the payload simulation is migrated to these resources following a trigger event. The migration technology presented here is designed to be used for any application, including large-scale HPC simulations

0 Bookmarks
 · 
53 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Thesis (M.S.)--Michigan Technological University, 2006. Includes bibliographical references.
    01/2006;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Impact cratering is an important geological process of special interest in Astrobiology. Its numerical simulation comprises the execution of a high number of tasks, since the search space of input parameter values includes the projectile diameter, the water depth and the impactor velocity. Furthermore, the execution time of each task is not uniform because of the different numerical properties of each experimental configuration. Grid technology is a promising platform to execute this kind of applications, since it provides the end user with a performance much higher than that achievable on any single organization. However, the scheduling of each task on a Grid involves challenging issues due to the unpredictable and heterogeneous behavior of both the Grid and the numerical code. This paper evaluates the performance of a Grid infrastructure based on the Globus toolkit and the GridWay framework, which provides the adaptive and fault tolerance functionality required to harness Grid resources, in the simulation of the impact cratering process. The experiments have been performed on a testbed composed of resources shared by five sites interconnected by RedIRIS, the Spanish Research and Education Network.
    Scientific Programming 01/2005; 13:19-30. · 1.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In recent years, there has been a dramatic increase in available compute capacities. However, these ``Grid resources'' are rarely accessible in a continuous stream, but rather appear scattered across various machine types, platforms and operating systems, which are coupled by networks of fluctuating bandwidth. It becomes increasingly difficult for scientists to exploit available resources for their applications. We believe that intelligent, self-governing applications should be able to select resources in a dynamic and heterogeneous environment: Migrating applications determine a resource when old capacities are used up. Spawning simulations launch algorithms on external machines to speed up the main execution. Applications are restarted as soon as a failure is detected. All these actions can be taken without human interaction. A distributed compute environment possesses an intrinsic unreliability. Any application that interacts with such an environment must be able to cope with its failing components: deteriorating networks, crashing machines, failing software. We construct a reliable service infrastructure by endowing a service environment with a peer-to-peer topology. This ``Grid Peer Services'' infrastructure accommodates high-level services like migration and spawning, as well as fundamental services for application launching, file transfer and resource selection. It utilizes existing Grid technology wherever possible to accomplish its tasks. An Application Information Server acts as a generic information registry to all participants in a service environment. The service environment that we developed, allows applications e.g. to send a relocation requests to a migration server. The server selects a new computer based on the transmitted resource requirements. It transfers the application's checkpoint and binary to the new host and resumes the simulation. Although the Grid's underlying resource substrate is not continuous, we achieve persistent computations on Grids by relocating the application. We show with our real-world examples that a traditional genome analysis program can be easily modified to perform self-determined migrations in this service environment. In den vergangenen Jahren ist es zu einer dramatischen Vervielfachung der verfügbaren Rechenzeit gekommen. Diese 'Grid Ressourcen' stehen jedoch nicht als kontinuierlicher Strom zur Verfügung, sondern sind über verschiedene Maschinentypen, Plattformen und Betriebssysteme verteilt, die jeweils durch Netzwerke mit fluktuierender Bandbreite verbunden sind. Es wird für Wissenschaftler zunehmend schwieriger, die verfügbaren Ressourcen für ihre Anwendungen zu nutzen. Wir glauben, dass intelligente, selbstbestimmende Applikationen in der Lage sein sollten, ihre Ressourcen in einer dynamischen und heterogenen Umgebung selbst zu wählen: Migrierende Applikationen suchen eine neue Ressource, wenn die alte aufgebraucht ist. 'Spawning'-Anwendungen lassen Algorithmen auf externen Maschinen laufen, um die Hauptanwendung zu beschleunigen. Applikationen werden neu gestartet, sobald ein Absturz endeckt wird. Alle diese Verfahren können ohne menschliche Interaktion erfolgen. Eine verteilte Rechenumgebung besitzt eine natürliche Unverlässlichkeit. Jede Applikation, die mit einer solchen Umgebung interagiert, muss auf die gestörten Komponenten reagieren können: schlechte Netzwerkverbindung, abstürzende Maschinen, fehlerhafte Software. Wir konstruieren eine verlässliche Serviceinfrastruktur, indem wir der Serviceumgebung eine 'Peer-to-Peer'-Topology aufprägen. Diese ``Grid Peer Service'' Infrastruktur beinhaltet Services wie Migration und Spawning, als auch Services zum Starten von Applikationen, zur Dateiübertragung und Auswahl von Rechenressourcen. Sie benutzt existierende Gridtechnologie wo immer möglich, um ihre Aufgabe durchzuführen. Ein Applikations-Information- Server arbeitet als generische Registratur für alle Teilnehmer in der Serviceumgebung. Die Serviceumgebung, die wir entwickelt haben, erlaubt es Applikationen z.B. eine Relokationsanfrage an einen Migrationsserver zu stellen. Der Server sucht einen neuen Computer, basierend auf den übermittelten Ressourcen-Anforderungen. Er transferiert den Statusfile des Applikation zu der neuen Maschine und startet die Applikation neu. Obwohl das umgebende Ressourcensubstrat nicht kontinuierlich ist, können wir kontinuierliche Berechnungen auf Grids ausführen, indem wir die Applikation migrieren. Wir zeigen mit realistischen Beispielen, wie sich z.B. ein traditionelles Genom-Analyse-Programm leicht modifizieren lässt, um selbstbestimmte Migrationen in dieser Serviceumgebung durchzuführen.
    06/2003;

Full-text (2 Sources)

View
26 Downloads
Available from
May 21, 2014