Conference PaperPDF Available

Self-Gridron: Reliable, Autonomous, and Fully Decentralized Desktop Grid Computing System based on Neural Overlay Network.

Authors:

Abstract and Figures

Although desktop Grid computing has been regarded as a cost-efficient computing paradigm, the system has suffered from scalability issues caused by its centralized structure. In addition, resource volatility generates system instability and performance deterioration. However, regarding the provision of a reliable and stable execution environment, resource management becomes more intricate when the system is constructed in a fully decentralized fashion without a central server. Scaling the system numerically and geographically is necessary for autonomous network organization, facile adaptation to execution failure and dynamic self-management of volatile resources. In order to develop a fully decentralized desktop Grid computing system securely, we propose an autonomous desktop Grid computing system, Self-Gridron based on a neural overlay network. Self- Gridron supports reliable, autonomous, and cost-effective scheduling which includes eligible resource classification and job management (i.e. allocation, replication, and reassignment). Furthermore, Self-Gridron provides sovereign learning with error correction) and evolves adaptively by itself to system changes or failure on the fly while improving performance.
Content may be subject to copyright.
Self-Gridron: Reliable, Autonomous, and
Fully Decentralized Desktop Grid Computing System
based on Neural Overlay Network
EunJoung Byun1, HongSoo Kim1, SungJin Choi2, SangKeun Lee1,
Young S. Han3, JoonMin Gil4, SoonYoung Jung5
1Department of Computer Science and Engineering, Korea University, Korea
2Department of Computer Science and Software Engineering, University of Melbourne, Australia
3Department of Information Media, Suwon University, Korea
4Department of Computer Science Education, Catholic University of Daegu, Korea
5Department of Computer Science Education, Korea University, Korea
Abstract - Although desktop Grid computing has been
regarded as a cost-efficient computing paradigm, the system
has suffered from scalability issues caused by its centralized
structure. In addition, resource volatility generates system
instability and performance deterioration. However,
regarding the provision of a reliable and stable execution
environment, resource management becomes more intricate
when the system is constructed in a fully decentralized fashion
without a central server. Scaling the system numerically and
geographically is necessary for autonomous network
organization, facile adaptation to execution failure and
dynamic self-management of volatile resources. In order to
develop a fully decentralized desktop Grid computing system
securely, we propose an autonomous desktop Grid computing
system, Self-Gridron based on a neural overlay network. Self-
Gridron supports reliable, autonomous, and cost-effective
scheduling which includes eligible resource classification and
job management (i.e. allocation, replication, and
reassignment). Furthermore, Self-Gridron provides sovereign
learning with error correction) and evolves adaptively by
itself to system changes or failure on the fly while improving
performance.
Keywords: Desktop Grid Computing, Overlay Network,
Neural Network, Volatility, Availability, Credibility,
Resource Management, Autonomy.
1 Introduction
Grid computing [1, 2] is a platform that allows
organizations to share resources owned by other institutes
through Virtual Organizations (VOs) over the network.
Desktop Grid computing [7] is a paradigm that achieves large
computing power at low cost by harnessing idle computing
resources pervading throughout the Internet. Several desktop
Grid systems have been developed such as Seti@Home [5],
BOINC [4], Bayanihan [3], Javelin [10], XtremWeb [8],
Charlotte [6] under various labels such as volunteer
computing [3], P2P Grid computing [14, 15, 16], global
computing [6, 8], public computing [4], resource provide
computing, etc.
Resource capability and stability are the essential
differences between traditional Grid computing and desktop
Grid computing. In a Grid computing system, resources
consist of high-end machines or instruments such as
supercomputer or expensive scientific equipment integrated
by a reliable wide bandwidth network environment, and
dedicated exclusively to the system. In contrast, desktop Grid
computing resources consist of comparatively low-end
machines such as desktop computers or notebook computers
that can leave and join at will.
Many existing desktop Grid systems [3, 4, 6, 7, 8]
support centralized architecture with a central server
managing resource pool and task pool while mapping (i.e.
scheduling) each other. In case of Seti@Home, the central
server plays the role of scheduler and manager for resources.
However, a central server can create a single point of failure
and overload the server even though the central structure has
strong stability. The most potentially fatal problem of a
central server-based system is the lack of scalability as the
system scale grows in terms of irruption and geography.
A decentralized structure is comparatively attractive to
deal with large scale desktop Grid systems if reliable
management could be provided. To do so, reliable overlay
network must be adapted. However, there are few studies
applied reliable distributed overlay network.
A few studies on desktop Grid computing such as CCOF
[18], Organic Grid [17], Messor [19], and Paradropper [29]
have applied an overlay network to support distributed
management. Even though these existing desktop Grid
systems based on an overlay network improve extensibility
and reduce risks arising from decentralized structure, they do
not function well enough in reliable resource selection and
fault tolerance against execution failure. Since existing
systems do not consider desktop volatility, they tend to suffer
job execution failure in the middle of execution and repeated
reallocations, resulting in delayed completion time.
In order to ensure a stable and adaptive desktop Grid
computing system, therefore, it is necessary to have a reliable
overlay network while providing eligible resource selection,
appropriate number of replication, and autonomous network
construction and management.
In this paper, as a stable overlay network environment,
we introduced a neural network structure to desktop Grid
system. Neural network [22], [23], [24] is a large-scale
network of connected nodes that has simple intelligence to
process the sum of weighted inputs in a parallel manner.
Through the given inputs and learning process, the network
aims to reach a steady state. Parallelism on neural network is
very well suited for an execution model of massively parallel
application of desktop Grid system in a distributed manner. In
addition, desktop grid computing on neural overlay network
offers additional benefits as the learning process improves
system reliability and decision accuracy.
This paper proposes a dependable, autonomous, and fully
distributed desktop Grid computing system, Self-Gridron,
based on a reliable neural network overlay. Self-Gridron is a
compound word of Grid and perceptron, which represents
transplantation of neural network onto desktop Grid
computing system in a self-governing (i.e. autonomous) and
fully decentralized manner. Each node (i.e. unit, cell, neuron,
desktop, volunteer, host, or resource) in Self-Gridron system
is called Gridron. Self-Gridron is a heuristic technique for
decision making and pattern recognition within complicated
models. As the system structure shifts toward a fully
decentralized environment, the management and scheduling
scheme must adapt in the face of dynamic participation. Self-
Gridron provides autonomous scheduling and management of
resources, job, and network (i.e. architecture) to improve
accuracy in decision making. Scheduling schemes on Self-
Gridron provide rapid selection of eligible and reliable
resources and cost-efficient redundancy (i.e. replication) for
job completion based on criteria given by a base overlay
network. In Self-Gridron, the tasks are distributed over the
neural overlay network according to the threshold of nodes
(i.e. desktop, volunteer, host, or resource provider) and
weights between them. The replication process is also
performed on the basis of threshold and connection strength
training based on error correction against to wrong prediction.
In this paper, Self-Gridron is exploited to organize
adaptive and reliable self-governing desktop Grid computing
system. We demonstrate that the proposed system improves
both performance and reliability. Through experimentation
and simulation we especially obtained a reduced task failure
rate in the middle of execution through accurate prediction of
nodal execution tendency based on a neural overlay network.
The remainder of the paper is organized as follows. In
Section 2 describes related works in terms of overlay network
in Grid systems and combination of neural network and Grid
computing. In Section 3, basic characteristics and details are
described. In Section 4, experimental results and performance
evaluation is given. Section 6 concludes this paper.
2 Related works
2.1 Overlay network in Grid computing
Organic Grid [17] proposed a tree overlay network and a
scheduling algorithm operated on the network in a distributed
structured Grid. The tree-shaped overlay network constructs
the job tree based on the performance of resource providers.
High performance nodes are assigned to higher levels (i.e.
near by the root) of the trees whereas lower performance
nodes are positioned at lower levels (i.e. far from the root) of
the tree. Organic Grid based on tree-based overlay network is
constructed dynamically while providing self-organization.
CCOF [18] developed a wave scheduler supporting time
zone aware based on CAN [20] overlay network to classify
nodes into day and night zones. In CCOF, there is no central
server and role limitation in that any node can be either a
server (i.e. consumer) or a client (i.e. donor) or both.
Messor [19] applied a self-organizing overlay network
based on peer-to-peer technology to provide load balancing
with ant (i.e. autonomous agent) and manage the network. In
Messor, a workload-based random graph is generated
dynamically in an unstructured random network for parallel
execution. The mobile agents proceed load balancing by
fetching jobs from over-loaded nests to light-loaded nests.
Paradropper [29] provided an overlay network based on
small world characteristics. The small world graph is
spawned randomly while Paradropper supports load
balancing between unbalancing loaded nodes.
However, there is delayed job completion time with all
of these systems due to repeated reallocation of failed jobs
since these systems since do not consider the volatility of
constituents (disappearing resource or execution failure) in
the middle of execution.
2.2 Combination of Grid and neural network
CNNTS [11] proposed a chaotic neural network
algorithm for task scheduling to minimize the total execution
time of task. It suggested an allocation (i.e. mapping) scheme
to schedule tasks to a processor in a parallel environment.
However, it is limited when adapted to a dynamic desktop
Grid since it concentrated on appropriate mapping for task
scheduling in stable parallel computing.
N2Grid [12], an artificial neural network simulation
environment, is a framework that applied neural network
paradigms to use neural network resources and functions.
However, N2Grid is not a Grid computing system but rather a
simulator focusing on the development of a neural network
environment based on Grid technology.
NeuroGrid [13] implemented a neural network to train
and evaluate the system within a Grid environment. However,
since NeuroGrid is based on a typical Grid that uses
comparatively reliable resources and network, it is inadequate
for a dynamic desktop Grid system. In addition, NeuroGrid
focused on implementation of neural network in a Grid for
potential neural network applications.
2.3 Scheduling schemes in desktop Grid
XtreamWeb [8] and Korea@Home [25] applied First
Come First Served (FCFS) where jobs are allocated to the
desktops in order of request.
Many desktop grid computing systems, including
Bayanihan [3], Charlotte [6], and Javelin [10] used an eager
scheduling scheme. To improve execution performance,
eager scheduling allocates a task to the fastest desktop and
reallocates failed or unfinished tasks from one participant to
another.
However, the above scheduling schemes just support
centralized structures and do not consider the dynamic and
unexpected execution properties of desktop resource. Besides,
existing systems suffer from delayed total execution
completion time (i.e. makespan or turnaround time) due to
repeated reallocation caused by execution failure in the
middle of execution.
2.4 Replication methods in Grid computing
Bayanihan [3] used replicas of task to provide sabotage
and fault tolerant.
BOINC [4] applied redundant computing to prevent
erroneous results.
In [26], R. Rahman et al. proposed the best replica
selection scheme to minimize access latency by using neural
network concerned access latency and bandwidth
consumption in a data Grid environment.
However, these studies do not consider resource
characteristics (i.e. desktop volatility, availability, etc.) and
they do not provide qualified replica selection and
appropriate number of replica based on system properties.
3 Self-Gridron: Reliable and autonomous
fully distributed desktop Grid system
We introduce neural network as an overlay network to
constitute Self-Gridron providing reliable scheduling and
replication scheme. The synergy of combination of desktop
Grid and neural network takes many advantages such as
reliable decision making in unpredictable large-scale
computing environment. This section presents how Self-
Gridron constructs composition and how it functions in the
construction process, basic scheduling and management
operations (i.e. allocation, reallocation, replication, etc.), and
contribution.
3.1 Neural Overlay Construction in Self-
Gridron
Self-Gridron system supports massive parallelism with
disseminated representation and distributed control of
network configuration in a distributed manner since
information is spread over several nodes in the network. It is
based on neural overlay networks modified existing neural
network configuration to support reliable desktop Grid
computing. Self-Gridron consists of five basic elements:
processing node (i.e. Gridron, desktop, host, or element),
input, weight, credibility checking function for activation of
influence (i.e. combining function, activation function from
inputs), and transfer function (i.e. activation function to
outputs).
In Self-Gridron, the connection strength between them is
given as Wji, which indicates weight from node i to node j, in
which each Gridron connected to each other. Each weight
represents the degree of trustworthiness between two
Gridrons. In addition, the weight on each link indicates how
much the incoming input affects the activation of the node.
Weight determination (i.e. reliability assignment) is essential
to the learning procedure since the network is able to learn
from adjustments of the connection weights. Self-Gridron
initializes weights to random values from small real numbers
between -1.0 and 1.0. The combining function integrates
inputs. Each node combines input from previous Gridrons
according to credibility checking function, decides node
activation, and then transmits the stimuli based on the
activation decision.
Self-Gridron network architecture consists of several
layers; input layers, hidden intermediate layers and output
layers. The input layer imports input data, intermediate
hidden layers transmit the data and the output layer exports
through. There are several intermediate hidden layers
between input and output layers. The connections form a
hierarchical layered directed structure.
Each hidden or output node of a multi-layered Self-
Gridron is designed to perform two computations. The
computation of the function signal for job allocation
appearing at the output of a node is expressed as a continuous
nonlinear function of the input signal and synaptic weights
associated with that node. The computation of an estimate of
the gradient vector is the gradients of the error surface with
respect to the weights connected to the inputs of a Gridron
that are required for the backward pass through the network.
The network topology (architecture of layers) is
important to provide a near-optimal solution. Since
constructing optimal architecture is an NP-complete problem,
topological settings, including the number of layers and the
number of nodes in each layer, and connections between
them are decided heuristically.
Each Gridron (i.e. node) constructs the neural overlay
architecture in which the node with a job becomes an input
node and nodes connected with the input node (i.e. nodes in
friends list) becomes intermediate nodes.
In the Self-Gridron, each node has a threshold value to
provide a basis for decision making. Through training, the 1st
layer consists of the most reliable nodes while the 2nd layer is
composed of backup nodes. The nodes on the 1st layer
distribute their job to the nodes on the 2nd layer. In the
distribution process, the weights on each edge and threshold,
which consider previous successful experiences on job
execution, are measured to choose candidates for distribution.
Self-Gridron can use the execution results from the 2nd layer
if the nodes on the 1st layer fail to finish the assigned task.
The amount of recalculation is not much even if the nodes on
both layers fail. This adaptive replication scheme provides a
self-adjusting network and system environments. Based on
the error signal received, connection weights are then updated.
Strength of back propagation lies in its ability to allow
calculation of an effective error for hidden layers and
derivation of learning rule for weights.
The trainable neural overlay network, Self-Gridron,
provides learning ability by strengthening the idea of the
learning ability of neural network with consideration on
desktop Grid computing characteristics. For Self-Gridron
network learning, back propagation network is composed of a
multi-layered feed forward network with a gradient-decent
learning algorithm which considers availability and reliability
of each node. Throughout the learning process, weights are
set based on training patterns and desired output such that the
adjustment may reduce errors (i.e. difference between the
actual outputs). The weight can have a positive value for
excitatory functions and negative for inhibitory. Weights are
adjusted based on the error which presents an accurate degree
of prediction to minimize error in the next learning period.
Even though there is noise after training or imperfect
information, the system can also make accurate decisions by
reviewing error in the previous experience and learning.
Especially, Self-Gridron is strong in the face of noise caused
by partial network failure or corruption from malicious nodes.
The system offers fault tolerant as partial failure does not
affect final decision making.
Self-Gridron provides adaptive and extensible network
structure based on neural overlay network and reliable
scheduling scheme supporting allocation, replication, and
fault tolerant based on threshold and weight values from the
network in a dispersed manner. Furthermore, Self-Gridron
sustains self-organization through learning that adjusts link
strength based on errors in the previous outputs.
3.2 Operations
The problem of selection and replication becomes even
more complicated in fully decentralized desktop Grid
environments in which there is not a central server managing
the entire scheduling process and resource management. To
provide secure management to a fully decentralized desktop
Grid environment, we developed Self-Gridron on a neural
overlay network supporting reliable scheduling and efficient
replication scheme.
There are two steps to operate Self-Gridron: training and
scheduling. In the training step, the nodes are connected to
each other according to their friend list. The initial weights
assigned with random values and the weights (i.e. connection
strength which specifies the degree of trustworthiness) will be
attuned through learning. The input node allocates test tasks
to nodes connected with the input node in intermediate layers.
The tasks are propagated to the following layers repeatedly
until there is no task or the task reaches a node in the output
layer. Each node having a task executes it.
Each step has two phases: feed forward and feed
backward. In the forward phase, the system provides input
patterns to each node and produces output by the input and
activation functions. The network indicates the learning
pattern. Each node produces output by using inputs and the
activation function. In the output layer, the initial weight is
recalculated by considering the error between actual output
and target output. In the backward step, the weight is updated,
reflecting error in a backward direction. The error propagates
backward while adjusting the weights connecting the output,
hidden and input layers. The forward and backward processes
are repeated until the system reaches steady state and error
criteria. The back propagation algorithm is a non-linear
extension of the LMS algorithm, which applies a modified
delta rule several times repeatedly and derives a stochastic-
approximation framework.
The results of each task are propagated to output nodes.
The output nodes verify the results of task, and evaluation
and error are propagated backward to adjust the weights.
Reliable and appropriate weights are decided from repeated
trial and error, ensuring learning.
Each processing Gridron has credibility. In the actual
scheduling step, which follows the training step, the input
nodes allocate tasks according to credibility. To distinguish
each link, Self-Gridron provides threshold to compare with
the weights. The threshold value is given in scheduling step
from the training step. A link with a weight higher than the
threshold becomes an allocation link whereas a link with a
weight less than the given threshold becomes a replication
link.
In forward phase the Self-Gridron computes output
where, indices i, j, and k refer to different nodes in the
network with signals propagating through the network from
bottom upward. Node j lies in a layer upward to node j when
node i is the input unit and node j is the hidden unit.
To make a decision on strength adjustment, the system
must know the error of each node. tpj is a target output (i.e.
correct output or ideal output), opj is a produced output
passing through the network, and ipj is an input. The output
node error is given pj pj pj
to
δ
=−
Eq.1 shows general weight adjustment to provide a new
weight given as
ji ji
new old
ij
WWlao=+ (1)
where, old
Wji is the weight before adjustment and new
W
j
i
is the renewed weight that considers the former weight and
adjustment based on accuracy. To find error minima, Self-
Gridron applies a gradient decent that is the result of a partial
differential that adds or subtracts as much as w. The
adjustment of weight is essential to increase accuracy. l is a
learning constant indicates the learning rate. l is one of the
effective factors in training to regulate relative size of the
change in weights (0<l<1). ai is an activation value of node i ,
which is the output of node i. Likewise, the input of the next
node j, oj is the output of node j. If there is no activation,
there is no change. When both a and o activate at the same
time, activation rate l is applied. According to sigmoidal
nonlinearity (logistic function is applied), node jth output is
1
1 exp( )
pj
pj
onet
=
+−
Self-Gridron applies modified delta rule which makes
the network learn by adjusting connection weights while
considering resource (i.e. desktop) features. Modified delta
rule suggests input patterns in the input layer, which then
operate the neural network. Next the system regulates
weights according to delta rule until learning is complete.
As given in Eq.2, weight adjustment (i.e. weighted
error) is done by error correction and consideration of
credibility is as given by Eq.3 of Gridron to reflect resource
characteristics. The new weight decided by the credibility of
Gridron differentiates Self-Gridron from typical neural
networks by considering desktop Grid computing attributes.
ji ji
new old
j
ij
WWlea=+×^
where, jj j
eto
=
(2)
where,
j
^indicates the credibility and ej is the error, the
difference between the actual output of node j, oj and target
output tj. If only one among ,,
i
j
le ais 0, the equation
becomes 0 and there is neither an increase nor decrease.
Since convergence will never take place when l becomes too
large, the system usually applies a heuristic approach to pick
a value of weight changes within a small portion of its
current value. There is no adjustment if there is no error.
Only when nodes are activated is the weight changed.
jjj
=
Α×Ε^
where, j
MTTF
M
TTF MTTR
Α= + and log
jPPΕ=
(3)
In Eq.3, Self-Gridron adjusts the system appropriately
as
j
^, credibility implies the degree of trustworthiness of
Gridron, the extent to which the desktop is available and the
degree to which the desktop acts predictably. Credibility is
indicated by the degree of accessibility (i.e. availability) and
predictability (i.e. entropy).
j
Α, Availability represents
readiness, the extent to which the resource is available.
Typically availability is designated by the ratio of repair time
(MTTR) to total service time (i.e. sum of MTTR and mean
time to failure (MTTF)).
Ε
, Entropy indicates the degree of
disambiguation, which measures how much the desktop is
predictable.
We introduced the LMS algorithm to provide a
gradient descent method to reduce total error for the entire
learning data. The mean square difference minimizes the cost
function to reduce variation between real productivity and
ideal output. LMS is one of the methods to identity the
desired weight vector. The error is calculated by
2
1
2
j
E
e=
1
1N
avg j
n
Ee
N=
=
The modified delta rule calculates strength, thereby
minimizes error by considering desktop credibility. The
method evaluates weight variations from differentiated
weights proportionally. Activation is complete when the sum
of imported inputs exceeds a certain threshold. The large
parallel information processing behaves as distributed
knowledge.
The essence of the learning process is the adjustment
of connection strength between nodes. The weights are
updated after each training pattern is presented. Through
learning, the weight is be optimized, based on credibility,
j
^
(i.e. trustworthy value that indicates the extent to which the
node is reliable) to improve system reliability.
Through this sub chaper, Self-Gridron neural overlay
network learns by adjusting connection strengths to generate
the correct output. For all sets of the learning pattern, the
error fine-tuning is repeated.
4 Experimentation and Evaluation
To evaluate the proposed Self-Gridron, we simulated
Self-Gridron with its back propagation algorithm with
modified delta rule. In our simulation, the two hidden layers
were used and 1000 tasks were run to test the system. Time
taken to complete the task was measured while adjusting
several parameters. The probability of correct execution (i.e.
successful execution rate) was measured.
While each set has trained 600 times, small learning
rate increase of speed of learning, therefore l = 0.1 is used. In
addition, credibility threshold for distinguishing allocation
link and replication link, performs well when thr
^= 0.637 is
used.
Fig. 1 represents simulation results of Mean Square
Error (MSE) and probability of correct execution respectively,
for adjustment of the number of replication nodes where
learning rate l = 0.1 and credibility threshold thr
^= 0.637.
0.20000
0.22000
0.24000
0.26000
0.28000
0.30000
0.32000
123456789101112
Number of Replicas
Mean Square Error
Figure 1. Mean square Error of each replication
5 Conclusions
In this paper, we proposed a reliable, autonomous and
fully distributed desktop Grid computing system, Self-
Gridron based on neural overlay network, and scheduling
scheme. Self-Gridron improves system stability and reliability
in terms of construction, learning, adjustment, and
management. Furthermore, the system supports self-
organization, self-management, and self-learning to
administer the system and network autonomously. This self-
governing characteristic makes the Self-Gridron well suitable
for fully distributed desktop Grid computing systems with
reliable and autonomous system management. Additionally, a
reliable scheduling scheme based on (i) adjustment of
weights (i.e. credibility) and threshold that considers
availability and predictability (i.e. inherent characteristics of
desktop Grid computing) and (ii) a repetitious learning
process, improves system dependability.
Self-Gridron exploits the neural overlay network
structure to create an autonomous system that is adaptive and
supports reliable scheduling scheme, including job allocation,
replication, reallocation, and management. The experimental
results demonstrated the extent of stable and reliable Self-
Gridron operation and improved performance through
evaluation of scheduling schemes in various simulations.
6 Acknowldgement
This work was supported by the Korea Research
Foundation Grant funded by the Korean Government
(MOEHRD) (KRF-2007-314-D00223)
7 References
[1] F. Berman, G. C. Fox, A.J.G. Hey, “Grid computing:
Making the global infrastructure a Reality”, Wiley, 2003.
[2] I. Foster, A. Iamnitchi, “On death, taxes, and the
convergence of peer-to-peer and grid computing”, 2nd Int.
workshop on Peer-to-Peer Systems. LNCS 2735:118–128,
2003.
[3] L. Sarmenta, S. Hirano, “Bayanihan: Building and
studying volunteer computing systems using java”, Future
Generation Computer Systems vol. 15 no.5-6, pp.675–686,
1999.
[4] D. Anderson, “BOINC: A system for public-resource
computing and storage”, 5th IEEE/ACM Int. Workshop on
Grid Computing, 2004.
[5] SETI@home, http://setiathome.ssl.berkeley.edu
[6] A. Baratloo, M. Karaul, Z. Kedem, P. Wijckoff,
“Charlotte: metacomputing on the web”, Future Generation
Computer Systems, vol.15 no.5–6, pp.559–570, 1999.
[7] A. Chien, B. Calder, S. Elbert, K. Bhatia, “Entropia:
Architecture and performance of an enterprise desktop grid
system”, Journal of Parallel and Distributed Computing.
vol.63 no.5, pp.597–610, 2003.
[8] G. Fedak, C. Germain, V. Neri, F. Cappello,
“XtremWeb: A generic global computing system”, 1st IEEE
Int. Symposium on Cluster Computing and the Grid:
Workshop on Global Computing on Personal Devices, 2001.
[9] O. Lodygensky, G. Fedak, F. Cappello, V. Neri, M.
Livny , D. Thain, “XtremWeb & Condor: sharing resources
between internet connected condor pool”, 3rd IEEE/ACM Int.
Symposium on Cluster Computing and the Grid: Workshop
on Global and Peer-to-Peer Computing on Large Scale
Distributed Systems, 2003.
[10] O. Neary, S. Brydon, P. Kmiec, S. Rollins, P. Cappello,
“Javelin++: scalability issues in global computing”,
Concurrency: Practice and Experience vol.12 no.8, pp.727–
735, 2000.
[11] H. Cao, Z. Yu, Y. Wang, “A Chaotic Neural Network
Algorithm for Task Scheduling in Overlay Grid”, SKG 2005.
[12] E. Schikuta and T. Weishaupl, "N2Grid: neural
networks in the grid", Proceedings of 2004 IEEE Int. Joint
Conference on Neural Networks, 2004.
[13] L. Krammer, E. Schikuta, H. Wanek, "A grid based
neural network execution service", Proceedings of the 24th
IASTED Int. conference on Parallel and distributed
computing and networks, 2006.
[14] I. Foster, A. Iamnitchi, “On death, taxes, and the
convergence of peer-to-peer and grid computing”, 2nd
Int.Workshop on Peer-to-Peer Systems. LNCS 2735:118–128,
2003.
[15] S. Choi, M. Baik, J. Gil, S. Jung, and C. S. Hwang,
"Adaptive Group Scheduling Mechanism using Mobile
Agents in Peer-to-Peer Grid Computing Environment,"
Applied Intelligence, vol. 25 no.2, pp. 199-221, 2006.
[16] Korea@Home, http://www.koreaathome.org/eng/
[17] A. J. Chakravarti, G. Baumgartner and M. Lauria, "Self-
Organizing Scheduling on the Organic Grid", Int. Journal of
High Performance Computing Applications, 2006
[18] D. Zhou, V. Lo, “Wave scheduler: scheduling for faster
turnaround time in peer-based desktop grid systems”, 11th
Workshop on Job Scheduling Strategies for Parallel
Processing, 2005.
[19] A. Montresor, H. Meling and O. Babaoglu, "Messor:
Load-Balancing through a Swarm of Autonomous
Agents",1st Int. Workshop on Agents and Peer-to-Peer
Computing, 2002.
[20] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S.
Shenker, “A scalable content-addressable network”, ACM
SIGCOMM, 2001.
[21] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H.
Balakrishnan, “Chord: A scalable peer-to-peer lookup service
for Internet applications”, ACM SIGCOMM, 2001.
[22] L. Fausett, Fundamentals of Neural Networks, Prentice-
Hall, 1994.
[23] K. Gurney, An Introduction to Neural Networks, UCL
Press, 1997.
[24] S. Haykin, Neural Networks , 2nd Edition, Prentice Hall,
1999.
[25] Korea@Home, http://korea@home.org
[26] R. Rahman, K. Parker, R.Alhajj, “A predictive
technique for replica selection in Grid environment”, IEEE
CCGRID 2007.
[27] B. Y. Zhao, J. Kubiatowicz, and A. Joseph. “Tapestry:
An infrastructure for fault-tolerant wide-area location and
routing,” Technical Report UCB/CSD-01-1141, University of
California at Berkeley, Computer Science Department, 2001.
[28] A. Rowstron and P. Druschel, “Pastry: Scalable,
decentralized object location and routing for large-scale peer-
to-peer systems,” Middleware2001, 2001.
[29] L. Zhong, D. Wen, Z. Wei Ming, Z. Peng,
“Paradropper: A General-Purpose Global Computing
Environment Built on Peer-to-Peer Overlay Network”,
ICDCSW'03, 2003.
... Some of the popular volunteer computing systems are BOINC [8,9], condor-like grid system [10], Entropia [11], XtremeWeb [12], Aneka [13], and SZTAKI [14]. Peer-to-Peer (P2P) based volunteer computing (VC) systems represent a decentralized, self-organized and scalable environment for running applications such as PastryGrid [15], BonjourGrid [16], ShareGrid [17], Condor-Flock P2P [18], and Self-Gridron [19]. A fundamental challenge in this large, decentralized and distributed resource sharing environment is efficient discovery of * Corresponding author. ...
Article
Volunteer computing which benefits from idle cycles of volunteer resources over the Internet can integrate the power of hundreds to thousands of resources to achieve high computing power. In such an environment the resources are heterogeneous in terms of CPU speed, RAM, disk capacity, and network bandwidth. So finding a suitable resource to run a particular job becomes difficult. Resource discovery architecture is a key factor for overall performance of peer-to-peer based volunteer computing systems. The main contribution of this paper is to develop a proximity-aware resource discovery architecture for peer-to-peer based volunteer computing systems. The proposed resource discovery algorithm consists of two stages. In the first stage, it selects resources based on the requested quality of service and current load of peers. In the second stage, a resource with higher priority to communication delay is selected among the discovered resources. Communication delay between two peers is computed by a network model based on queuing theory, taking into account the background traffic of the Internet. Simulation results show that the proposed resource discovery algorithm improves the response time of user’s requests by a factor of 4.04 under a moderate load.
... Consequently, fully-centralized P2P Desktop Grid allowing each autonomic desktop computer to individually allocate resources as a scheduler has become a promising trend. It is no wonder that there already emerged quite a few corresponding projects, such as PastryGrid [3], BonjourGrid [4], Condor-Flock P2P [5], Self-Gridron [6], [7], etc. ...
Conference Paper
Full-text available
Fully decentralized resource allocation for P2P desktop Grid allows each participating node to act as both resource provider and requester. The system performance indicators (including throughput, makespan, etc) are easily degraded by the unbalanced load distribution, which is probably caused by the fast-changing states of heterogeneous resources due to arbitrary task submissions. Although the cooperative load rebalancing methods can mitigate the problem, they are likely to introduce the contention on under-utilized resources with growing task arrival rates, leading to the sub-optimal load balancing efficacy. Our focus is on how to optimize load balancing status by taking into account minimizing the conflict of autonomic task migration decisions in P2P desktop Grid. Our load rebalancing process is modeled as a set of independent stochastic Bernoulli trials by letting each heavily loaded node push its surplus loads to its surrounding lightly loaded nodes. We proved that the surplus load amount should be shifted based on a proper ratio by considering decision conflicts and designed a novel load balancing algorithm with provably small decision conflict probability. We derived an upper-bound for this probability, which will be reduced down to about 2% under our algorithm. Finally, we validated via simulation that the system performance can be significantly improved accordingly.
Article
Full-text available
Volunteer computing systems exploiting large amounts of geographically dispersed resources on the Internet for solving complex scientific problems. However, scheduling scientific workflows in a fully decentralised way and low overhead is a challenging task in these environments. To counter this challenge, this paper presents a fully decentralised proximity-aware workflow-scheduling policy for these environments. The proposed scheduling consists of three phases. In the first phase, each workflow application is partitioned into sub-workflows in order to minimise data dependencies among them. The second phase of the workflow-scheduling algorithm finds some resources to execute each sub-workflow. These resources are selected based on Quality of Service (QoS) constraints of the workflow, load balancing and proximity of resources. Each workflow can have QoS constraints in terms of minimum CPU speed and minimum RAM or hard disk requirements. In the third phase, sub-workflows will be executed on each resource based on local scheduling algorithm to minimise the partial makespan. The proposed scheduling policy focuses on the reduction of communication overhead to improve the performance of I/O-intensive and dataintensive workflows. Simulation results show that the proposed workflow-scheduling policy improves the average response time of scientific workflows up to 53.6% under a moderate load.
Article
One of the main challenges in peer-to-peer-based volunteer computing systems is an efficient resource discovery algorithm. Load balancing is a part of resource discovery algorithm and aims to minimize the overall response time of the system. This paper introduces an analytical model based on distributed parallel queues to optimize the average response time of the system in a distributed manner. The proposed resource discovery algorithm consists of two phases. In the first phase, it selects peers in a load-balanced manner based on QoS constraints of request. In the second phase, a proximity-aware feature is applied to select the peer with minimum communication overhead among selected peers in the first phase. Two dispatching strategies are proposed for the load balancing based on stochastic analysis of routing in the distributed parallel queues. These policies adopt probabilistic and deterministic sequences to redirect requests to the capable peers in the system. Simulation results show that the proposed resource discovery algorithm improves the response time of user’s requests by a factor of 1.8 under a moderate load.
Article
Peer-to-peer Desktop Grids provide integrated computational resources by leveraging autonomous desktop computers located at the edge of the Internet to offer high computing power. The arbitrary arrival and serving rates of tasks on peers impedes the high throughput in large-scale P2P Grids. We propose a novel autonomous resource allocation scheme, which can maximize the throughput of self-organizing P2P Grid systems. Our design possesses three key features: (1) high adaptability to dynamic environment by proactive and convex-optimal estimation of nodes’ volatile states; (2) minimized task migration conflict probability (upper bound can be limited to 2%) of over-utilized nodes individually shifting surplus loads; (3) a load-status conscious gossip protocol for optimizing distributed resource discovery effect. Based on a real-life user’s workload and capacity distribution, the simulation results show that our approach could get significantly improved throughput with 23.6–47.1% reduction on unprocessed workload compared to other methods. We also observe high scalability of our solution under dynamic peer-churning situations.
Article
Full-text available
Peer-to-peer grid computing is an attractive computing paradigm for high throughput applications. However, both volatility Peer-to-peer grid computing is an attractive computing paradigm for high throughput applications. However, both volatility due to the autonomy of volunteers (i.e., resource providers) and the heterogeneous properties of volunteers are challenging due to the autonomy of volunteers (i.e., resource providers) and the heterogeneous properties of volunteers are challenging problems in the scheduling procedure. Therefore, it is necessary to develop a scheduling mechanism that adapts to a dynamicapts to a dynamic peer-to-peer grid computing environment. In this paper, we propose a Mobile Agent based Adaptive Group Scheduling Mechanism peer-to-peer grid computing environment. In this paper, we propose a Mobile Agent based Adaptive Group Scheduling Mechanism (MAAGSM). The MAAGSM classifies and constructs volunteer groups to perform a scheduling mechanism according to the properties (MAAGSM). The MAAGSM classifies and constructs volunteer groups to perform a scheduling mechanism according to the properties of volunteers such as volunteer autonomy failures, volunteer availability, and volunteering service time. In addition, the of volunteers such as volunteer autonomy failures, volunteer availability, and volunteering service time. In addition, the MAAGSM exploits a mobile agent technology to adaptively conduct various scheduling, fault tolerance, and replication algorithms MAAGSM exploits a mobile agent technology to adaptively conduct various scheduling, fault tolerance, and replication algorithms suitable for each volunteer group. Furthermore, we demonstrate that the MAAGSM improves performance by evaluating the scheduling suitable for each volunteer group. Furthermore, we demonstrate that the MAAGSM improves performance by evaluating the scheduling mechanism in Korea@Home. mechanism in Korea@Home.
Conference Paper
Full-text available
It has been reported [25] that life holds but two certainties, death and taxes. And indeed, it does appear that any society, and in the context of this article, any large-scale distributed system, must address both death (failure) and the establishment and maintenance of infrastructure (which we assert is a major motivation for taxes, so as to justify our title!). Two supposedly new approaches to distributed computing have emerged in the past few years, both claiming to address the problem of organizing large-scale computational societies: peer-to-peer (P2P) [15, 36, 49] and Grid computing [21]. Both approaches have seen rapid evolution, widespread deployment, successful application, considerable hype, and a certain amount of (sometimes warranted) criticism. The two technologies appear to have the same final objective, the pooling and coordinated use of large sets of distributed resources, but are based in different communities and, at least in their current designs, focus on different requirements.
Conference Paper
Javelin is a Java-based infrastructure for global computing. This paper presents Javelin++, an extension of Javelin, intended to support a much larger set of computational hosts. First, Javelin++'s switch from Java applets to Java applications is explained. Then, twoscheduling schemes are presented: a probabilistic work-stealing scheduler and a deterministic scheduler. The deterministic scheduler also implements eager scheduling, as well as another fault-tolerance mechanism for hosts that have failed or retreated. AJavelin ++ API is sketched, then illustrated on a raytracing application. Performance results for the twoschedulers are reported, indicating that Javelin++, with its broker network, scales better than the original Javelin. 1 Introduction Our goal is to harness the Internet's vast, growing, computational capacity for ultra-large, coarse-grained parallel applications. Some other research projects based on a similar vision include CONDOR #21, 13#, Legion #18#, and GLOBUS #14#...
Conference Paper
Task scheduling is one key issue in computational overlay grid. For finding solution to this problem, we propose a Chaotic Neural Network Algorithm for Task Scheduling, the goal of this algorithm is to minimize the total execution time of task by appropriately mapping the task to processor in the parallel system. We formulate the task scheduling problem that considers heterogeneousness. To evaluate the effectiveness of this algorithm, we compare this algorithm with the simulated annealing, and the genetic algorithm etc. The average experimental result of the algorithm is obviously superior to other conventional algorithms.
Conference Paper
The recent success of Internet-based computing projects, coupled with rapid developments in peer-to-peer systems, has stimulated interest in the notion of harvesting idle cycles under a peer-to-peer model. The problem we address in this paper is the development of scheduling strategies to achieve faster turnaround time in an open peer-based desktop grid system. The challenges for this problem are two-fold: How does the scheduler quickly discover idle cycles in the absence of global information about host availability? And how can faster turnaround time be achieved within the opportunistic scheduling environment offered by volunteer hosts? We propose a novel peer-based scheduling method, Wave Scheduler, which allows peers to self organize into a timezone-aware overlay network using structured overlay network. The Wave Scheduler then exploits large blocks of idle night-time cycles by migrating jobs to hosts located in night-time zones around the globe, which are discovered by scalable resource discovery methods. Simulation results show that the slowdown factors of all migration schemes are consistently lower than the slowdown factors of the non-migration schemes. Compared to traditional migration strategies we tested, the Wave Scheduler performs best. However under heavy load conditions, there is contention for those night-time hosts. Therefore, we propose an adaptive migration strategy for Wave Scheduler to further improve performance.
Conference Paper
This paper presents the design and evaluation of Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Pastry performs application-level routing and object location in a potentially very large overlay network of nodes connected via the Internet. It can be used to support a variety of peer-to-peer applications, including global data storage, data sharing, group communication and naming. Each node in the Pastry network has a unique identifier (nodeId). When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and notifies applications of new node arrivals, node failures and recoveries. Pastry takes into account network locality; it seeks to minimize the distance messages travel, according to a to scalar proximity metric like the number of IP routing hops. Pastry is completely decentralized, scalable, and self-organizing; it automatically adapts to the arrival, departure and failure of nodes. Experimental results obtained with a prototype implementation on an emulated network of up to 100,000 nodes confirm Pastry’s scalability and efficiency, its ability to self-organize and adapt to node failures, and its good network locality properties.
Conference Paper
Peer-to-peer (P2P) systems are characterized by decentral- ized control, large-scale and extreme dynamism of their environment. Developing applications that can cope with these characteristics requires a paradigm shift that puts adaptation, resilience and self-organization as primary concerns. Complex adaptive systems (CAS), commonly used to explain the behavior of many biological and social systems, could be an appropriate response to these requirements. In order to pursue these ideas, this paper presents Messor, a decentralized load-balancing algo- rithm based on techniques such as multi-agent systems drawn from CAS. A novel P2P grid computing system has been designed using the Messor algorithm, allowing arbitrary users to initiate computational tasks.
Conference Paper
Replication in a data grid reduces access latency and bandwidth consumption. However, when different sites hold replicas of a particular file, there is a significant benefit realized by selecting the best replica from among them. The best replica is the one that optimizes the desired performance criterion such as absolute performance (i.e. speed), cost, security or transfer time. By selecting the best replica, the access latency can be minimized. We develop a predictive framework that uses data from various sources and predicts transfer times of the sites that host replicas. With this estimate, one site can request the replica from the site that has the lowest transfer time. We use a neural network (NN) for transfer time prediction of different sites that currently hold file replicas. We compare the results with a multi-regression model and the simulation results demonstrate that the neural network technique is capable of predicting transfer time more accurately than the regression based model.