Conference Proceeding

Extending GridSim with an architecture for failure detection

Dept. of Comput. Syst., Castilla La Mancha Univ., Castilla La Mancha
01/2008; DOI:10.1109/ICPADS.2007.4447756 ISBN: 978-1-4244-1889-3 In proceeding of: Parallel and Distributed Systems, 2007 International Conference on, Volume: 2
Source: DBLP

ABSTRACT Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of resources that are geographically distributed across different locations. However, these resources are independent and managed separately by various organizations with different policies. This will have a major impact to users who submit their jobs to the Grid, as they have to deal with issues such as policy heterogeneity, security and fault tolerance. Moreover, the changes of Grid conditions, such as resources that may become unavailable for a period of time due to maintenance and/or suffer failures, would significantly affect the quality of service (QoS) requirements of users. Therefore, it is essential for users to take into account the effects of resource failures during jobs execution. In this paper, we present our work on introducing resource failures and failure detection into the GridSim simulation toolkit. As we need to conduct repeatable and controlled experiments, it is easier to use simulation as a means of studying complex scenarios. We also give a detailed description of the overall design and a use case scenario demonstrating the conditions of resources varied over time.

0 0
 · 
1 Bookmark
 · 
102 Views
  • Source
    Article: GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing
    [show abstract] [hide abstract]
    ABSTRACT: Clusters, grids, and peer-to-peer (P2P) networks have emerged as popular paradigms for next generation parallel and distributed computing. They enable aggregation of distributed resources for solving large-scale problems in science, engineering, and commerce. In grid and P2P computing environments, the resources are usually geographically distributed in multiple administrative domains, managed and owned by different organizations with different policies, and interconnected by wide-area networks or the Internet.
    05/2002;
  • Source
    Conference Proceeding: Design and evaluation of a decentralized system for grid-wide fairshare scheduling
    [show abstract] [hide abstract]
    ABSTRACT: This contribution presents a decentralized architecture for a grid-wide fairshare scheduling system and demonstrates its potential in a simulated environment. The system, which preserves local site autonomy, enforces locally and globally scoped share policies, allowing local resource capacity as well as global grid capacity to be logically divided across different groups of users. The policy model is hierarchical and subpolicy definition can be delegated so that, e.g., a VO that has been granted a resource share can partition its share across its projects, which in turn can divide their shares between project members. There is no need for a central coordinator as policies are enforced collectively by the resource schedulers. Each local scheduler adopts a grid-wide view on utilization in order to steer local resource utilization to not only maintain local resource shares but also to contribute to maintaining global shares across the entire set of grid resources. Share enforcement is addressed by an algorithm that calculates simple priority values, thus simplifying integration with local schedulers, which can remain unaware of the hierarchical share policy structure.
    e-Science and Grid Computing, 2005. First International Conference on; 01/2006
  • Conference Proceeding: The anatomy of the grid: enabling scalable virtual organizations
    [show abstract] [hide abstract]
    ABSTRACT: Not Available
    Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on; 02/2001

Full-text (2 Sources)

View
7 Downloads
Available from
19 Nov 2012

Keywords

conduct repeatable
 
detailed description
 
different locations
 
different policies
 
failure detection
 
failures
 
Grid conditions
 
Grid technologies
 
GridSim simulation toolkit
 
jobs
 
jobs execution
 
major impact
 
next generation
 
resource failures
 
resources varied
 
unavailable
 
use case scenario
 
use simulation
 
users
 
various organizations