Conference PaperPDF Available

Benchmarking of integrated OGSA-BES with the grid middleware

Authors:

Abstract

This paper evaluates the performance of the emerging OGF standard OGSA - Basic Execution Service (BES) on three fundamentally different Grid middleware platforms: UNICORE 5/6, Globus Toolkit 4 and gLite. The particular focus within this paper is on the OGSA-BES implementation of UNICORE 6. A comparison is made with baseline measurements, for UNICORE 6 and Globus Toolkit 4, using the legacy job submission interfaces. Our results show that the BES components are comparable in performance to existing legacy interfaces. We also have a strong indication that other factors, attributable to the supporting infrastructure, have a bigger impact on performance than BES components.
Benchmarking of integrated
OGSA-BES with the Grid middleware
Hedman, Riedel, Mucci, Netzer, Gholami, M. Memon, A. Memon, Shah
EU project: RIO31844-OMII-EUROPE
Outline
Batch Job Execution Interfaces
Benchmark details
Results
System Level
Component Level
Conclusions
EU project: RIO31844-OMII-EUROPE
Introduction
Most Grids offer services to execute batch jobs.
Middleware traditionally used proprietary protocols to
provide these services.
OGF standardized job submission in the BES
recommendation.
The OMII-Europe project developed BES
implementations for three middleware stacks (Globus,
gLite and UNICORE).
This benchmarking effort targets this new services and
tries to provide information about the performance of
the developed solutions.
EU project: RIO31844-OMII-EUROPE
UNICORE Atomic Services (UAS)
Exposes core functionality via Web-Service interfaces
Specific to the UNICORE 6 stack
Follows OASIS WS-RF pattern
Provides job control, data management and file transfer
We only consider job submission and control
Target System Service for job submission
Job service to manage job
EU project: RIO31844-OMII-EUROPE
Basic Execution Service (BES)
Web-Services based interface for job submission
Standardized by the Open Grid Forum
Consists of three port types:
BES Factory for job creation and bulk job management
BES Activity for management of a single job
BES Management for service management tasks
Excludes security solution
UNICORE 6 uses SAML and optionally VOMS
Globus Toolkit uses proxy certificates
gLite uses VOMS proxy certificates
EU project: RIO31844-OMII-EUROPE
BES Implementations
Three Implementations were provided by OMII-Europe
Independent Services
UNICORE BES above XNJS backend
gLite CREAM-BES as plugin for the CREAM-CE
Wrapper/Adapter approach
Globus BES as a wrapper for a WS-GRAM service
CROWN Metascheduler
Can submit jobs to multiple BES instances.
Provides its own BES interface.
EU project: RIO31844-OMII-EUROPE
BES Implementations
EU project: RIO31844-OMII-EUROPE
The Benchmark
Measure the overhead that the Grid middleware adds to job execution.
EU project: RIO31844-OMII-EUROPE
Benchmark Variants
System Level Approach
Use command line clients provided by MW.
Provides performance from end-user perspective.
Includes client side startup overhead.
High load on client machine limiting factor.
Utilizes bulk submission capabilities if available.
Component Level Approach
Directly use the web service interface of the MW.
More appropriate for server performance measurements.
Only overhead for making the WS call are included.
More complicated to adopt to new MW stack.
EU project: RIO31844-OMII-EUROPE
System Level Benchmark
Advantages
Measures end-user performance
Relatively simple to add new MW stack
Shows client side performance
Disadvantages
Includes client start up costs
Limited by client side performance
Complicated to simulate “real” life usage scenario
Problems in obtaining compareable results
EU project: RIO31844-OMII-EUROPE
Component Level Benchmark
Advantages
Measures service performance
Allows direct comparison between different MW stacks
and interfaces.
Lower client side load
Disadvantages
High development effort to add new MW or interface
Cannot benchmark client behavior (e.g. CLIQ)
Faces interoperability problems (e.g. security)
EU project: RIO31844-OMII-EUROPE
Benchmark Implementation
Serial BM for UAS and BES uses
only one single thread.
System level BM uses a thread pool
to first submit all jobs and then poll
for status (except for CondorG and
CLIQ).
Concurrent BM for UAS and BES
uses two thread pools, one for
submitting jobs and one for polling
status.
EU project: RIO31844-OMII-EUROPE
Benchmark Limitations
Uses only 0 length jobs
All jobs are submitted in the beginning of a run
Uses only polling, no notifications
Needs command line client tools
Code for different MW stacks not unified
EU project: RIO31844-OMII-EUROPE
System Level Performance (UNICORE)
EU project: RIO31844-OMII-EUROPE
System Level Performance (Globus)
EU project: RIO31844-OMII-EUROPE
System Level Performance
Data staging has biggest influence.
Bulk submission modes (CLIQ, CondorG) better that
many single job submissions.
Polling leads to congestion in the end of Globus
experiments, possibly caused by client start-up costs.
Relatively big spread between different runs of the
same experiment. Carefully controlled environment
necessary.
UNICORE seems to be better adopted to the presented
benchmark.
EU project: RIO31844-OMII-EUROPE
Component Level Performance (Serial)
EU project: RIO31844-OMII-EUROPE
BM as Stress Test Tool
EU project: RIO31844-OMII-EUROPE
Component Level Performance
Serial submission does not suffer from polling
congestion.
However experiments show resource leaks and
memory problems.
Sensitive to latency of submission and polling interval.
BES components compareable to legacy interfaces.
EU project: RIO31844-OMII-EUROPE
UAS Performance (Concurrent)
EU project: RIO31844-OMII-EUROPE
BES Performance (Concurrent)
EU project: RIO31844-OMII-EUROPE
Server Load (BES 16 Threads)
EU project: RIO31844-OMII-EUROPE
Client Load (BES 16 Threads)
EU project: RIO31844-OMII-EUROPE
Server Load (UAS 16 Threads)
EU project: RIO31844-OMII-EUROPE
Client Load (UAS 16 Threads)
EU project: RIO31844-OMII-EUROPE
Component Performance – Concurrent Jobs
BES shows some concurrency problems causing
performance to drop.
UAS shows balanced load between client and server.
UAS is slower than BES for single threads, maybe
because jobs need to be started explicitly.
BES drops some jobs during concurrent submission,
around 2 jobs out of 750, perhaps due to server
overload.
More investigation needed to find out if we found a bug
in BES or BM implementation.
EU project: RIO31844-OMII-EUROPE
Conclusions
Type of service dominates over mechanism
Careful control of test environment is needed
Installation and configuration of MW takes time
Preliminary results show that BES is compareable to
legacy mechanisms
However, UNICORE BES currently cannot handle
concurrent requests as good as UAS does.
BM experiments have been able to uncover a number of
implementation bugs in early BES services.
EU project: RIO31844-OMII-EUROPE
Further Work
Use more than one client node to stress server.
Use BES Activity instead of only BES Factory.
Extend concurrent BM to WS-GRAM and GT BES.
Use other than 0-length jobs.
Allow for extended job submission simulating steady
state.
EU project: RIO31844-OMII-EUROPE
Acknowledgements
This work is supported by the
European Commission
through the OMII-Europe
project INFSO-RI-031844.
Software can be downloaded
from the project repository at:
http://www.omii-europe.org
Or contact Gilbert Netzer:
noname@pdc.kth.se
... In fact, Grisu2 follows the whole OGF HPC Basic Profile specification [12], which consists of not only the BES interface as the main interface to interact with a job submission gateway, but also WS-Basic Security Profile [13] as the authentication method, and JSDL [14] as the job description language. The reason for choosing BES is because it is a standard web service interface, which can be used by different programming languages, without needing a client library, and also it is widely adopted by various organizations [15,16] and efficient [17]. ...
Conference Paper
Full-text available
Much contemporary research benefits from the operation of the Grid for large-scale data storage, sharing, analysis, processing and simulation. However, existing grid systems typically require users to have good IT skills, which is a hurdle for many scientists. Providing easier usability of grid systems is a big challenge. A number of systems, frameworks and portals have been developed over the past few years, however they have mostly been designed to work based on several assumptions in a particular environment, and it is not easy to adapt them to different needs or environments. In this paper, we present our system that adopts several new technologies and acts as a broker and a gateway to regional grid resources. Interaction and usability are improved by multiple interfaces aimed at users with different requirements and IT skills: a modern and easy-to-use web interface, a SOAP web service interface and a Restful interface. The usability of this system is demonstrated in a case study.
Conference Paper
Full-text available
This document presents a specification for a Basic Execution Service (BES): a service to which clients can send requests to initiate, monitor, and manage computational activities. The specification defines an extensible state model for activities; an extensible information model for a BES and the activities that it creates; and two port-types; BES-Management and BES-Factory. BES-Management defines operations for managing the BES itself. BES-Factory defines operations for initiating, monitoring, and managing sets of activities, and for accessing information about the BES. An optional unspecified BES-Activity port-type indicates an extension point for operations relating to the monitoring and management of individual activities.
Article
Full-text available
This document specifies the semantics and structure of the Job Submission Description Language (JSDL). JSDL is used to describe the requirements of computational jobs for submission to resources, particularly in Grid environments, though not restricted to the latter. The document includes the normative XML Schema for the JSDL, along with examples of JSDL documents based on this schema.
Article
Full-text available
In a distributed Grid environment with ambitious service demands the job submission and management interfaces provide functionality of major importance. Emerging e-Science and Grid infrastructures such as EGEE and DEISA rely on highly available services that are capable of managing scientific jobs. It is the adoption of emerging open standard interfaces which allows the distribution of Grid resources in such a way that their actual service implementation or Grid technologies are not isolated from each other, especially when these resources are deployed in different e-Science infrastructures that consist of different types of computational resources. This paper motivates the interoperability of these infrastructures and discusses solutions. We describe the adoption of various open standards that recently emerged from the Open Grid Forum (OGF) in the field of job submission and management by well-known Grid technologies, respectively gLite and UNICORE. This has a fundamental impact on the interoperability between these technologies and thus within the next generation eScience infrastructures that rely on these technologies.
Article
Full-text available
In the last couple of years, many e-Science infrastructures have begun to offer production services to e-Scientists with an increasing number of applications that require access to different kinds of computational resources. Within Europe two rather different multi-national e-Science infrastructures evolved over time namely Distributed European Infrastructure for Supercomputing Applications (DEISA) and Enabling Grids for E-SciencE (EGEE). DEISA provides access to massively parallel systems such as supercomputers that are well suited for scientific applications that require many interactions between their typically high numbers of CPUs. EGEE on the other hand provides access to a world-wide Grid of university clusters and PC pools that are well suited for farming applications that require less or even no interactions between the distributed CPUs. While DEISA uses the HPC-driven Grid technology UNICORE, EGEE is based on the gLite Grid middleware optimized for farming jobs. Both have less adoption of open standards and therefore both systems are technically non-interoperable, which means that no e-Scientist can easily leverage the DEISA and EGEE infrastructure with one suitable client environment for scientific applications. This paper argues that future interoperability of such large e-Science infrastructures is required to improve e-Science in general and to increase the real scientific impact of world-wide Grids in particular. We discuss the interoperability achieved by the OMII-Europe project that fundamentally improved the interoperability between UNICORE and gLite by using open standards. We also outline one specific scientific scenario of the WISDOM initiative that actually benefits from the recently established interoperability.
Conference Paper
Full-text available
We propose a black-box approach to performance analysis of grid middleware and present the architecture of a non-invasive platform-independent evaluation tool that quantifies the effects of the overhead imposed by grid middleware on both the throughput of the system and on the turnaround times of grid applications. This approach is a step towards producing a middleware independent, comparable, reproducible and fair performance analysis of grid middlewares. The result of such performance analysis can be used by system administrators to tune the system configuration and by developers to find the bottlenecks and problems in their design and implementation of the system. It can also be used to devise more optimized usage patterns. As a proof of concept, we describe the implementation details of the evaluation tool for UNICORE 5 and demonstrate the result of initial experiments.
Conference Paper
The UNICORE grid system provides a seamless, secure and intuitive access to distributed grid resources. In recent years, UNICORE 5 is used as a well-tested grid middleware system in production grids (e.g. DEISA, D-Grid) and at many supercomputer centers world-wide. Beyond this production usage, UNICORE serves as a solid basis in many European and International research projects and business scenarios from T-Systems, Philips Research, Intel, Fujitsu and others. To foster ongoing developments in multiple projects, UNICORE is open source under BSD license at SourceForge. More recently, the new Web services-based UNICORE 6 has become available that is based on open standards such as the Web services addressing (WS-A) and the Web services resource framework (WS-RF) and thus conforms to the open grid services architecture (OGSA) of the open grid forum (OGF). In this paper we present the evolution from production UNICORE 5 to the open standards-based UNICORE 6 and its various Web services-based interfaces. It describes the interface integration of emerging open standards such as OGSA-BES and OGSA-RUS and thus provides an overview of UNICORE 6.
Conference Paper
This paper builds on extensive experience with the UNICORE middleware to derive requirements for the next generation of Grid execution management systems. We present some well-known architectural ideas and design principles that allow building Grid servers that are adaptable to any type of target systems, from single workstations or PCs to huge supercomputers, and flexible enough for the novel usage scenarios and business models that are coming up in next-generation Grid systems. These ideas are used to implement an execution management system similar in scope to the UNICORE NJS.
Article
Grid benchmarking is an important and challenging topic of Grid computing research. In this paper, we present an overview of the key challenges that need to be addressed for the integration of benchmarking practices, techniques, and tools in emerging Grid computing infrastructures. We discuss the problems of performance representation, measurement, and interpretation in the context of Grid benchmarking, and propose the use of ontologies for organizing and describing benchmarking metrics. Finally, we present a survey of ongoing research efforts that develop benchmarks and benchmarking tools for the Grid. Copyright © 2006 John Wiley & Sons, Ltd.