BeesyBees - Efficient and Reliable Execution of Service-based Workflow Applications for BeesyCluster using Distributed Agents.
ABSTRACT The paper presents an architecture and implementation that allows distributed execution of workflow applications in BeesyCluster using agents. BeesyCluster is a middleware that allows users to access distributed resources as well as publish applications as services, define service costs, grant access to other users and consume services published by others. Workflows created in the BeesyCluster middleware are exported to BPEL and executed by agents in a distributed environment. As a proof of concept, we have implemented a real workflow for parallel processing of digital images and tested it in a real cluster-based environment. Firstly, we demonstrate that engaging several agents for distributed execution is more efficient than a centralized approach.We also show increasing negotiation time in case of too many agents. Secondly, we demonstrate that execution in the proposed environment is reliable even in case of failures. If a service fails, a task agent picks a new equivalent service at runtime. If one of task agents fails, another of remaining agents takes over its responsibilities. The communication between the middleware, agents and services is encrypted.
-
Citations (0)
-
Cited In (0)
Page 1
BeesyBees – Efficient and Reliable Execution of
Service-based Workflow Applications for
BeesyCluster using Distributed Agents
Paweł Czarnul, Mariusz Matuszek, Michał Wójcik and Karol Zalewski
Faculty of Electronics Telecommunications and Informatics, Gdansk University of Technology
Email:{pczarnul,mrm}@eti.pg.gda.pl
Abstract—The paper presents an architecture and implemen-
tation that allows distributed execution of workflow applications
in BeesyCluster using agents. BeesyCluster is a middleware that
allows users to access distributed resources as well as publish
applications as services, define service costs, grant access to
other users and consume services published by others. Workflows
created in the BeesyCluster middleware are exported to BPEL
and executed by agents in a distributed environment. As a
proof of concept, we have implemented a real workflow for
parallel processing of digital images and tested it in a real
cluster-based environment. Firstly, we demonstrate that engaging
several agents for distributed execution is more efficient than a
centralized approach. We also show increasing negotiation time in
case of too many agents. Secondly, we demonstrate that execution
in the proposed environment is reliable even in case of failures.
If a service fails, a task agent picks a new equivalent service at
runtime. If one of task agents fails, another of remaining agents
takes over its responsibilities. The communication between the
middleware, agents and services is encrypted.
I. INTRODUCTION
I
today. In service-based environments this requires efficient
algorithms for scheduling workflows i.e. selection of services
for particular tasks while meeting optimisation goals. In the
literature [1], [2] the workflow is usually defined as a directed
acyclic graph G(V,E) where vertices V denote tasks to be
executed while edges denote time dependencies between the
tasks. A workflow can be concrete if there is one service to be
executed for each task or abstract, in which case there is a set
of functionally equivalent services for each task, possibly with
different QoS parameters. The latter almost always include ex-
ecution time and cost and if needed others, such as reliability,
reputation of the provider etc. One service needs to be chosen
for each task such that a global goal is optimized with possibly
meeting other QoS constraints. Typical optimisation goals
include minimisation of execution time with an upper bound
on the total cost of selected services which makes the problem
NP-hard. Such workflows are executed by workflow engines
which invoke services, control their statuses and transfer data.
NTEGRATION of various systems and components is one
of the most crucial challenges that needs to be addressed
II. PROBLEM STATEMENT
Workflows represent an integration of tasks, in which a
set of alternative services may be defined (either manually
or automatically) for each task. Alternatives may be statically
linked to a task or dynamically assigned at runtime. Regardless
of which service is selected for a task, the workflow should
be executed in such a way that efficiency and reliability of
workflow execution should be maximised. We assume that the
workflow definition as well as the optimisation criteria and
solver are already given [3] which means that services selected
for particular tasks are known.
At this point, the problem becomes how to manage the
execution of a distributed workflow application so that the
following goals are optimized and maintained:
• efficiency — because of potentially high communication
latency and low bandwidth of WANs, communication
between services through agents close to interacting ser-
vices is faster than through a distant centralised execution
engine;
• reliability — since the node(s) on which the workflow
execution engine runs as well as service locations may
be geographically distributed, it is likely that at times
connections may be lost; the solution should ensure
reliable and continuous execution in such cases;
• security — all service invocations and data flows should
be performed in a fully secure manner.
We present an agent-based distributed management mecha-
nism that addresses all of the above and examine its perfor-
mance in a practical example executed in a real environment.
III. RELATED WORK
A. Workflow Types and Workflow Management Environments
Literature studies bring examples of many middlewares
and environments for workflow execution. First and foremost,
these do differ based on the target workflows to be executed.
Scientific workflows usually focus on integration of compu-
tational services into chains executed on HPC resources such
as clusters and supercomputers, or analysis of data acquired
from input sensors or devices by services installed on powerful
machines. Such workflows can span over various geographi-
cally distributed sites and virtual organisations forming grid
systems [4], [5] and usually focus on data flow. There are
several systems [4] that support such applications including
Kepler [6], Gridbus, Triana [7], Pegasus [8], P-GRADE [9],
Directed Acyclic Graph Manager (DAGMan), ICENI [10],
UNICORE, Taverna, GridFlow, GrADS, Askalon, GridAnt.
Proceedings of the International Multiconference on
Computer Science and Information Technology pp. 173–180
ISBN 978-83-60810-27-9
ISSN 1896-7094
978-83-60810-27-9/09/$25.00 c ? 2010 IEEE173
Page 2
Conversely, business workflows focus on discovery, selec-
tion and integration of services offered by various parties so
that certain QoS parameters are met. Such workflows focus
more on controlling the flow and often incorporate constructs
such as choice, loop and other conditionals. Business work-
flows usually consider many more QoS parameters apart from
the execution time and cost such as reliability, accessibility,
fidelity, conformance, security etc. Systems such as Meteor-S
[11], [12] focus on automating the process of service discovery
and selection using an ontology-based approach. Selected
services are executed automatically making service discovery,
selection and execution almost fully automatic, based on the
given workflow specification and given services and their
descriptions. BPEL [13] is often used for describing business
oriented workflows and includes control and data flows as well
as service invocations. There are several execution engines
for workflows specified in BPEL such as The ActiveBPEL
Engine [14], bexee [15] or Silver [16].
Thirdly, in the context of pervasive and ubiquitous com-
puting, workflows often react to events asynchronously and
more importantly the context is considered for invocation of
a service and changing the state. The latter is defined as
information defining the state of an object [17]. As an example,
uWDL [18] is used for describing ubiquitous workflows and
allows specification of both the services as well as the context
and service flow through the node and the link elements
respectively.
B. Existing Service-based Solution in BeesyCluster
The already existing workflow support module in Beesy-
Cluster developed by our research group [3], [19] contains a
workflow editor, an optimiser and execution engine and has
already been tested on a variety of scientific and business
workflow applications, such as multimedia processing, numer-
ical simulations [19] or business workflows [3].
BeesyCluster is a middleware that allows its users to invoke
sequential or parallel applications exposed as services on
registered system accounts on various clusters and servers.
Such services can be assigned to particular workflow tasks
in BeesyCluster’s editor. The workflow editor is implemented
by an applet. Created workflows are saved in BeesyCluster’s
database. Based on service execution times and costs, Beesy-
Cluster’s optimiser selects one service for each task so that
the given criteria are optimised. It can optimise either a linear
combination of the total cost of selected services and the
workflow execution time or e.g. the execution time with a
global constraint on the total cost of selected services. The
execution engine is implemented within a Java EE server. For
each workflow node a SIMessageBean which is a message
driven bean is responsible for executing the service chosen for
a particular workflow task by the optimizer is used. This is
done within the onMessage method which is executed upon
receiving a JMS message. For the given task, after the service
has been executed, the bean copies output files to dedicated
directories of services chosen for successor tasks and initiates
execution of following tasks by sending JMS messages. The
execution status of a particular node instance is updated in the
database dynamically.
The drawback of this solution is centralized management
of execution. If large data is passed between services, it
needs to be passed through BeesyCluster which increases
the workflow execution time. We propose to optimize this
by launching several distributed JADE agents that would
launch services for workflow tasks and act as local proxies
for communication between the services. Comparison to other
agent-based approaches is presented in Section V.
IV. PROPOSED AGENT-BASED SOLUTION
As stated in the previous section, BeesyCluster’s execution
engine is implemented within a Java EE server. This approach
allows for easy execution management, but it can also create
a bottleneck for efficient execution and a single point of
failure. Should a problem with either the server or its network
connectivity occur, the whole workflow execution may be at
risk. This in itself is a serious limitation.
By using a set of well defined, industry standard conformant
interfaces, we migrated the task of execution of a prepared
workflow to an agent-based environment, which we nicknamed
BeesyBees. By separating the two tasks we also gained a
possibility of having a pluggable architecture, allowing for
experimentation with different approaches to workflow execu-
tion. By implementing the execution management in mobile
agents we take advantage of enormous flexibility of this
environment.
In this paper, we concentrate on increased efficiency and re-
liability of the workflow execution, which is a main difference
to original BeesyCluster’s solution.
A. Architecture
The BeesyBees system is based on the JADE (Java Agent
DEvelopment Framework) [20] agent system implementing
the FIPA [21] communication and the OMG MASIF [22] man-
agement standards. It provides distributed and decentralized
workflow execution for BeesyCluster using agents.
BeesyCluster itself can be regarded as a middleware that of-
fers an easy-to-use WWW interface to access various accounts
on various geographically distributed clusters and servers
through a single account in BeesyCluster. This allows man-
agement of files and directories on such clusters and servers,
compiling and running sequential and parallel applications
with support for various queuing systems, a team work envi-
ronment with sharing data and others. Furthermore, users can
publish own applications as services within BeesyCluster and
make them available to other BeesyCluster users. A provider
can specify a cost the client would need to pay for invoking
the service. BeesyCluster contains a subsystem for handling
virtual payments between users. Access to clusters and servers
as well as running particular commands, searching for and
running services is also possible through the Web Service
interface [23], [24].
Figure 1 shows a generalised diagram for BeesyBees system
architecture. There are four types of agents implemented in the
system:
174PROCEEDINGS OF THE IMCSIT. VOLUME 5, 2010
Page 3
• GateWayAgent — receives workflow descriptions in
BPEL and spawns TaskAgents,
• GraphAgent — monitors the progress of workflow ex-
ecution and gives a graphical representation of agent
instances and decisions made,
• StatusAgent — collects reports on task execution states
from TaskAgents and persist actual status of workflow
realization,
• TaskAgent — executes a single workflow task. A group
of TaskAgents executes the workflow cooperatively.
A workflow is composed using BeesyCluster’s SIEditor
applet, a tool for modeling workflows that consist of tasks
with services assigned out of those defined in BeesyCluster.
For the each task from one to few services can be chosen.
First, the definition of the workflow to be run is fetched from
BeesyCluster’s database and saved in BPEL, which then is
handed to the BessyBees system where it is received by the
GateWayAgent. All the communication between BeesyCluster
and BeesyBees is handled with appropriate web services. Then
the GateWayAgent spawns TaskAgents in order to execute
workflow’s tasks.
TasksAgents negotiate the assignment of each task [25].
TaskAgent interested in executing a particular task sends its
proposal including its matching score to all other agents.
The score is an object which can be filled with various
comparable metrics, like for example a load of a machine
an agent resides on. When the agent receives other agent’s
execution proposals it agrees only when it is not interested
in that particular task or it’s matching score is lower. This
process is represented in Figure 2. Finally, the TaskAgent
which received approvals only, begins task execution. When
the task is done the TaskAgent sends notification to all agents
executing particular workflow, containing information which
task has been done and using which service. Example of
notification is presented in Figure 3.
The execution state of workflow’s tasks is monitored by a
StatusAgent. Messages sent by TaskAgents during workflow
execution are monitored. These include information whether a
task has been executed, or if there were problems with calling
services. That data is saved persistently by the StatusAgent so
it could possibly be used for recovery after the whole system
crash. Additionally a GraphAgent which shows a graphical
representation of a workflow execution can be turned on.
B. Efficiency
The implementation launches a certain number of software
agents which negotiate which agents are responsible for ex-
ecution of particular tasks. Secondly, the agents may run on
a defined number of containers. This allows deployment and
testing of both a fully centralized architecture in which one
agent acts as a central management point or a fully distributed
approach with several agents and containers. Obviously, the
optimal number of agents and containers may depend on the
size of the workflow as well as locations of actual services.
Starting too many agents results in too much overhead for
negotiation.
??????????
??????????
??????????
??????????
????????A?
????????A?
BCC??DE??????BF???
BCC??DE??????BF?A?
??? ?CDE??????BF???
??? ?CDE??????BF???
BCC??DE??????BF?A?
??? ?CDE??????BF???
?
?
A
?
?
?
?
?
?
?
??
??
?A
?????
???????
??E??E?
???????
??E??E?
???????
??E??EA
Figure 2.Negotiation between Agents
?????????
?????????
A
B
C
?DEF?
D?????F
?D?B??A
D?????F
?D?B??C
D?????F
?D?B??B
Figure 3.Notifications about Current Task Status
C. Fault tolerance
BeesyBees was designed with fault tolerance in mind.
Currently, it is tolerant of service and TaskAgent failures.
When either fails, the failure is recognised and either the
respective flow path is restarted, or an alternative service is
being sought and utilised. During task execution TaskAgent
is blocking access to its assigned task by rejecting proposals
concerning execution of that task. Thanks to this feature failure
tolerance is achieved. When TaskAgent fails, nothing else is
blocking an uncompleted task, so one of remaining agents
picks it and proposes its execution to other agents. A service
failure is detected after a specified number of unsuccessful
retries. When this condition occurs an alternative service is
chosen if available.
D. Security
At this point, we assume that the agent environment is
managed by a group of trusted entities. We use security
mechanisms implemented in JADE. Each container has its own
certificate signed by our Certification Authority (CA) and it
communicates with other containers using SSL.
Originally, BeesyCluster executes services on computing
nodes using SSH. Each computing node has its own record
PAWEŁ CZARNUL, MARIUSZ MATUSZEK, ET AL.: BEESYBEES—EFFICIENT AND RELIABLE EXECUTION OF SERVICE-BASED WORKFLOW175
Page 4
BeesyCluster
SIEditor
Optimizer WebService
?
WebService
BeesyBees
StatusAgent
GateWayAgent
GraphAgent
TaskAgent
JADE
TaskAgent
webService call
messaging
spawning agent
Symbols' meaning:
Figure 1. System Architecture
on the BeesyCluster’s main server that contains its public
key. In the agent-based approach we used a similar method.
Each machine that contains an agent container has its own
list of trusted computing nodes. Each agent can establish an
SSH connection from a machine where it actually runs to
communicate with computing nodes and invoke services.
The gateway agent is exposed as a Web Service and
serves as a proxy between BeesyCluster and BeesyBees. The
communication between BeesyCluster and the gateway agent
is secured using HTTPS and certificates. There is always a
possibility of attacks using holes that exist in software or
solutions that we are depending on. In [26] such a problem as
well as a way to mitigate its adverse effects is presented.
V. OTHER AGENT-BASED APPROACHES TO WORKFLOW
MANAGEMENT
One of similar projects we can mention is JBees [27] —
a workflow management system based on agent technology
combining collaboration agents and the coloured Petri net.
As opposed to BeesyBees using BeesyCluster as a system
for workflow creation, JBees makes use of built-in manage-
ment agents providing user interface for the human workflow
manager. Moreover resources in JBees, which can be com-
pared to BeesyCluster services, are strictly integrated with the
system and are represented by resource agents. The idea of
process agents and storage agents is similar to BeesyBees
task agents and state agents. Process agents are responsible
for the execution of particular cases. Storage agents collect
information from process agents (in JBees it is approached
through the monitor agent, as opposed to BeesyBees where
task agents communicate directly with the state agent) and
make it persistent. As mentioned, resources in JBees are
accessed by process agents requesting task execution through
resource agents. In BeesyBees services are called directly by
task agents. As for workflow description, JBees makes use of
coloured Petri nets [28].
WS2JADE presented in [29] is a tool for runtime deploy-
ment and control of web services. It is of interest, because
it uses the same agent system, JADE. The main goal of this
project is integration of agent systems and web services, it is
approached by representing each web service by a specialised
agent. Similarly to JBees, web services are called through
their associated agents. In order to call a web service, a
client agent searches DF (Directory Facilitator, JADE built
in service directory agent) for it, then if the service is not
present, DF can trigger WS2JADE to look it up in the web
service environment. If an appropriate service is found, the
web service agent, which registers its service in DF, is created.
Finally, the service is called by exchanging messages between
agents.
SwinDeW-A [30] integrates services using WS2JADE by
enhancing SwinDeW (Swinburne Decentralised Workflow)
with agents. As opposed to BeesyBees using BeesyCluster’s
optimiser in order to choose best services for specified task,
SwiNDew-A makes use of negotiation agents which are able
to negotiate with a number of service agents. In order to
choose the most suitable service, a Service Level Agreement
(SLA) needs to be formed between the negotiation agent and
each service provider. Agents can negotiate non functional
parameters (such as time, cost, availability), which are similar
to those proposed in BeesyCluster [3].
Finally, a relatively recent development is WADE (Work-
flow and Agents Development Environment) [31], which is
an extension of JADE agent framework, facilitating both the
possibility to define agent tasks according to the workflow
metaphor and an architecture with mechanisms allowing ad-
ministration of distributed WADE based applications. Work-
flows in WADE are expressed as Java classes.
VI. SIMULATIONS
A. Testbed Workflow and Environment
As an example, we have run a workflow application for
parallel processing of RAW digital images in order to produce
a Web album. The process uses standard steps in professional
photo editing ($1 is the input file name):
• rawtotiff — conversion from RAW to lossless
TIFF. Implemented using the dcraw converter as dcraw
-T $1
176PROCEEDINGS OF THE IMCSIT. VOLUME 5, 2010
Page 5
Figure 7.Testbed Workflow in BeesyCluster’s Editor
Figure 4.Visualisation of Workflow Execution
• normalize — normalisation of the image using Im-
ageMagick’s convert as convert $1
-normalize $1
• sharpen — sharpening the image implemented as
convert $1 -sharpen 1x1.2 $1.jpg
• resize — resizing the image implemented as convert
$1 -resize 600x400 $1.jpg
Figure 5.Visualisation of Workflow Execution with Service Failure
• albumgeneration — implemented by either jigl or
album.
All those services are bash scripts executed through ssh. Input
for each service contains pictures being processed.
Figure 7 presents thetestbed workflow created in
BeesyCluster’s editor. Parallel paths include nodes running
rawtotiff, normalize, resize and sharpen filters
while the final node gathers resulting jpg images and produces
PAWEŁ CZARNUL, MARIUSZ MATUSZEK, ET AL.: BEESYBEES—EFFICIENT AND RELIABLE EXECUTION OF SERVICE-BASED WORKFLOW 177
View other sources
Hide other sources
-
Available from imcsit.org
-
Available from imcsit.org
-
Available from imcsit.org
-
Available from imcsit.org
-
Available from imcsit.org