Checkpointing in CosMiC: a User-level Process Migration Environment
ABSTRACT The CosMiC system is a user-level process migrationenvironment. Process migration is defined as the mechanism to checkpoint the state of an unfinished process, transfer the state from one machine to another, and resume process execution on the new machine. The main purposes of process migration are (1) to utilize the CPU power and balance load on all machines in an environment; (2) to provide faulttolerance by migrating a process from a failed machine to another machine. CosMiC provides an extensible architecture to allow an application to choose its own checkpointing mechanism. It is equipped with four checkpoint libraries, namely, libckp, libfcp, libft and libst. They provide different strategies for state saving and restoring. Libckp is a transparent checkpoint library, it checkpoints the entire process state. It requires minimum user involvement and no modifications to the source code. Libfcp is a file checkpoint library that saves and restores file contents. Libft is a critical ...
SourceAvailable from: scholar.lib.vt.edu
Wireless Personal Communications 12/2014; 79(3):2089-2125. DOI:10.1007/s11277-014-1975-9 · 0.98 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Grid computing is focusing more on resource sharing, cycle stealing, and other modes of collaboration among dynamic and geographically distributed organizations. As a comple-ment to the traditional data migration in client/server and distributed computing systems, moving computations to proper locations for resource sharing becomes an effective alternative. The Grid's characteristics require computation migration schemes be able to fit the underlying dynamic and het-erogeneous computing environments. Although no fully-functioned such system is available, current migration systems have achieved partial success at different aspects. Major difficulties and issues are identified, and thereafter possible solutions are proposed. Especially, a multi-grained computation migration system, MigThread, is outlined to demonstrate the feasibility of moving computations in Grid-like environments. Possible techniques are discussed to shed light on the system design of next generation computation migration which is the essence to enable flexible resource sharing on the Grid.