Content uploaded by Ruben Valles
Author content
All content in this area was uploaded by Ruben Valles on Jan 14, 2014
Content may be subject to copyright.
ZIVIS: A City Computing Platform Based on Volunteer Computing
B. Antolí
1
, F. Castejón
2
, A. Giner
1
, G. Losilla
1
, J.M. Reynolds
1
, A. Rivero
1
,
S. Sangiao
1
, F. Serrano
1
, A. Tarancón
1
, R. Vallés
1
and J.L. Velasco
1
1
Institute for Biocomputation and Physics of Complex Systems (BIFI-UZ), Zaragoza, Spain
2
Laboratorio Nacional de Fusión (LNF-CIEMAT), Madrid, Spain
Abstract
Volunteer computing has come up as a new form of
distributed computing. Unlike other computing
paradigms like Grids, which use to be based on
complex architectures, volunteer computing has
demonstrated a great ability to integrate dispersed,
heterogeneous computing resources with ease. This
article presents ZIVIS, a project which aims to deploy
a city-wide computing platform in Zaragoza (Spain).
ZIVIS is based on BOINC (Berkeley Open
Infrastructure for Network Computing), a popular
open source framework to deploy volunteer and
desktop grid computing systems. A scientific code
which simulates the trajectories of particles moving
inside a stellarator fusion device, has been chosen as
the pilot application of the project. In this paper we
describe the approach followed to port the code to the
BOINC framework as well as some novel techniques,
based on standard Grid protocols, we have used to
access the output data present in the BOINC server
from a remote visualizer.
1. Introduction
Popularization of Information and Communication
Technologies has introduced personal computers as a
common tool in most working environments and
general public's home. This represents a potential,
vast, unused computing resource, superior in
computing power and storage capacity to any
nowadays supercomputer center or whatever cutting-
edge computing facility.
The advent of high-speed networks has brought new
forms of distributed computing which try to coordinate
dispersed, heterogeneous computing resources in a
decentralized manner. One of this new paradigms is
Grid Computing, whose model tries to integrate
geographically dispersed computational resources in a
virtual infrastructure which, for the end-user, acts as a
single, huge and powerful computer.
However, current grid middleware does not seem to be
suitable for the “average home PC” referred to in the
first paragraph; Grids use to involve clusters and other
organizationally-owned hardware managed by
specialized staff and available 24 hours-a-day.
Moreover, the requirements to achieve the
“abstraction” of resources made grid middleware to
adopt a complex architecture which is definitely not
appropriate to be deployed on general public PCs.
As an alternative, volunteer computing (also known as
“Desktop Grid” or “Public-Resource Computing”)
enables distributed computing platforms where cycles
are scavenged from idle desktop computers. In contrast
to “classical” Grid systems, Desktop Grids need to
deploy just a thin layer of abstraction over the
resources they manage. BOINC [1] (“Berkeley Open
Infrastructure for Network Computing”) has become
the de facto standard volunteer computing software.
ZIVIS is an initiative which aims to establish in
Zaragoza (Spain) the first “city-wide supercomputer”.
The objective is to integrate as much computers as
possible to form a virtual computing platform in the
metropolitan area of Zaragoza (650.000 inhabitants).
This is achieved through volunteer computing, namely
using an adapted version of the open source software
BOINC.
The project is run by the Zaragoza city council [2] and
the Institute for Biocomputation and Physics of
Complex Systems [3] of the University of Zaragoza,
which will provide the scientific applications to be run
on the platform.
Although citizens are expected to participate in the
project in an altruistic manner (motivated by the “fair
cause” character of the chosen applications), some
promotional events have been scheduled. Concretely,
after the public presentation of the project, there will
be a period of five weeks, after which the top ten users
in terms of contributed CPU time will be rewarded in a
festive event with demos and presentation of results.
As an additional way to encourage the involvement of
citizens, there will be a parallel ranking of users
grouped by district.
A secondary objective of the project is to encourage
public awareness of science, fostering citizen's
involvement in a scientific research project.
1.1. BOINC
The Berkeley Open Infrastructure for Network
Computing (BOINC) is a framework to deploy
distributed computing platforms based on volunteer
computing. It is developed at U.C. Berkeley Spaces
Sciences Laboratory and it is released under an open
source license.
BOINC is the evolution of the original SETI@home
project, which started in 1999 and attracted millions of
participants worldwide.
BOINC software is divided in two main components:
the server and the client side of the software. BOINC
allows sharing computing resources among different
autonomous projects. A BOINC project is identified by
a unique URL which is used by BOINC clients to
register with it. Every BOINC project must run a host
with the server side of the BOINC software.
BOINC software on the server side comprises several
components: one or more scheduling servers that
communicate with BOINC clients, one or more data
servers that distribute input files and collect output
files, a web interface for participants and project
administrators and a relational database that stores
information about work, results, and participants.
BOINC software provides powerful tools to manage
the applications run by a project. For instance it allows
to easily define different application versions for
different target architectures. A “workunit” describes a
computation to be performed, associating a (unique)
name with an application and the corresponding input
files.
Not all kind of applications are suitable to be deployed
on BOINC. Ideally, candidate applications must
present “independent parallelism” (divisible into
parallel parts with few or no data dependencies) and a
low data/compute ratio since output files will be sent
through a typically slow commercial Internet
connection.
The workflow of a BOINC system is very simple.
When the BOINC client notices the local computer has
been idle for a while, it launches a cool screensaver
and contacts any of the project servers it is registered
with to retrieve a job to execute (pull approach). When
the client finishes with the downloaded workunit, it
contacts the project server to upload the result and asks
the same or a different server for a new workload. If at
any point of this process the user comes back to his
computer, the workunit computation and the
screensaver are stopped immediately.
Volunteer computing must deal with erroneous results
due to computer malfunctioning and, sometimes, from
malicious participants. BOINC faces this problem
providing an easy redundancy mechanism for
identifying and rejecting erroneous results.
1.2. Pilot application: ISDEP
In order to involve the highest number of citizens in
the project, the application chosen to run on the ZIVIS
infrastructure had to be viewed as interesting and
worthwhile by the general public. This is the case of
research in alternative energy sources, like fusion,
which is seen as the future energy source. BIFI has a
well-known experience in fusion research; for that
reason the code selected as pilot application for the
ZIVIS project was ISDEP [4].
ISDEP (“Integrator of Stochastic Differential
Equations in Plasmas”) is a fusion plasma application
programmed and highly optimized in C language
which calculates the trajectories of the particles inside
a fusion device. This environment is read from several
input files which includes the geometry of the vacuum
chamber of the fusion device, the magnetic field
created by the coils, and the electrostatic and particles
(electrons and ions) density profiles. As a result, the
same calculus core can be used for a stellarator fusion
device or for a tokamak one.
The magnetic field is read from a grid (see Guasp et
al.[5]) and interpolated in simulation time. The
application calculates the next position of each particle
along its trajectory as a function of the previous one,
taking into consideration the environment and other
initial parameters of the particles. These parameters
include the possibility of a collision between particles
during simulation and the electrical and magnetic fields
inside the device. Those effects can be independently
"switched off or on" by the user in order to assess its
influence in particle evolution.
Actually, the calculation of the positions of the
different particles of each trajectory during the
simulation implies the solution of a set of Stochastic
Differential Equations which governs the evolution of
the plasma. The numerical algorithm used for solving
them (by Kloeden and Pearson [6]) is of the Runge-
Kutta type, upgraded so that it can deal with a gaussian
noise (the one caused by the collisions of the particles
with ions and electrons inside the device). At the end,
the core calculates a great number of particles
trajectories (typically about one million) and obtains
averages of the relevant magnitudes, such as densities,
temperatures and fluxes of particles.
The generation of workunits for ISDEP will be
differentiated in two phases. In a first round (first four
weeks of the production period) clients will calculate a
determinate number of trajectories and will take
measurements over them, essentially sampling
positions using a given interval of time. They will also
save a complete trajectory for a posterior visualization
during the public demostration event. During the
second phase (fifth week), only sampled data will be
calculated.
2. Architecture
The deployed infrastructure for the project comprises
two 64-bits dual core nodes acting as BOINC servers,
both connected to the Internet through a 100 Mb
connection. The machines are connected to each other
through a direct 1Gb link.
The design of the BOINC server software allows to run
a project with tens of thousands of volunteers on a
single server computer. However, in order to prevent
the capacity of our server to be eventually exceeded,
we decided to balance load spreading BOINC services
across two hosts.
When we initially created the BOINC project using
make_project, everything was run on a single host:
web server, scheduling server, daemons, tasks, MySQL
database server and file upload handler. Among these
elements, the MySQL server does the heaviest work.
Therefore the first step to increase the capacity of our
project was to move the MySQL server to the host
which does not run the BOINC server. Such
configuration is specified in the project configuration
file. The MySQL database is replicated to the other
node by sharing the MySQL data directory through
NFS.
Because of this partial NFS setup between the nodes,
project's data is actually shared, and as a result, each
machine is capable of taking over in case of other's
failure.
The master URL of the project is common to both
machines, where load balancing is done by Apache
servers in both hosts. Because of that, both machines
run the web-services, both can schedule and have
access the MySQL database.
Furthermore, the boinc servers are connected with a
remote visualizer through a dedicated 1 Gb dark fiber
channel. This visualizer will be used to show the
results of the first five weeks of operation of the
project in a final public event.
With this infraestructure, our project will be soon
available for spreading tasks among participating
computers and receiving their results.
3. Application integration
To be added into a BOINC project, applications must
incorporate some interaction with the BOINC client:
they must notify the client about start and finish, and
they must allow for renaming of any associated data
files, so that the client can relocate them in the
appropriate part of the guest operating system and
avoid conflicts with workunits from other projects.
Besides, the approach to workunits differs from
standard batches. In a recurrent batch queue, the
application does some work until it is terminated (or
killed) and requeued, and then it starts from the same
environment. Usually the application checks which
was the last output file (or chunk of data) generated
and restarts from it, accumulating a series of result files
in the local directory.
On the other side, when running on BOINC, kills are to
be expected but output files can not be accumulated
locally. Thus, some retuning of the restart procedure
may be necessary, depending of the original workflow.
For instance, it is right to check for integrity of a given
output file, but one should not rely on the contents of a
directory. Also, it is not convenient to generate too
many output files in a single workunit.
BOINC usually sends the same workunit to some
different machines and compares the results to validate
them. If the output is very big, it is likely that some of
the machines will fail to send it back, then invalidating
the whole workunit. Similarly, if the workunit takes too
much time, it is likely to be aborted in some of the
machines.
An API is provided to do the required tasks. It is a C++
interface, but we have not found any problem to mix its
gpp code with the original ISDEP gcc source.
Corresponding to the minimum requisites named
Figure 1: Overview of ZIVIS infrastructure
above, one must incorporate calls to boinc_init,
boinc_finish and boinc_resolve_filename and their
required libraries. Note that there exists a workaround
to exec applications for which the source code is not
available, but it is always preferable to incorporate the
required calls in the source of the application, as we
have done.
Once the data has been collected back in the server,
some processing must be done to validate it. In our
case, we allow for different hardware to execute the
same workunit, and then the comparison is of a
statistical character. For the particles trajectories
calculated by different computers for the same
workunit, the initial positions must be equal for all the
workunits, while more advanced positions must be
reasonably similar. On the other hand, when only
integer results are expected -or when the server have
been asked to look only for similar hardware- it is
possible to use a validation already provided in the
examples.
4. Remote visualization
Visualization is a powerful method to analyze
simulation results. Keeping this in mind, we had
developed a graphical interface for ISDEP. This
graphical application was programmed using C++ and
the Fox Toolkit library as GUI development
framework. It also makes use of GLUT and Mesa (free
software/open source library which is is a fully
software-based graphics API and its code-compatible
with OpenGL) to represent the data graphically.
The GUI was designed following user-friendly and
robustness interaction guidelines. It is simple for new
users to start working with this application. Several
initial and interactive parameters can be selected form
the GUI. Moreover, other initial parameters such as
direction, velocity and initial point (in the 3D space) to
launch particles can be easily fixed just using the
mouse.
In order to present results in an eye-catching manner,
we have developed a modified version of the visualizer
which will be used during the public event to be held at
the end of the production period. The changes
introduced involve accessing the output data stored in
the remote BOINC servers. For such access we have
used GridFTP.
GridFTP is an standard grid protocol which provides
secure, robust, fast and efficient transfer of (especially
bulk) data. We have used the implementation included
in Globus Toolkit v4, namely the API for developing
custom clients.
5. Conclusion
This paper presents ZIVIS, a project which aims to
deploy the first metropolitan supercomputer based
on volunteer computing. We have described the
technologies the project relies on, and work done so
far.
ZIVIS is conceived as a long-term project to deploy
a stable production-level computing infrastructure in
the metropolitan area of Zaragoza. In fact, there are
already plans to deploy new applications other than
ISDEP after the initial pilot period.
As a promising future line to work in, we plan to
develop a grid interface for the BOINC server (kind
of “Globus job manager”). This would enable the
integration of ZIVIS resources in any classical Grid
infrastructure and would allow to access them using
standard Globus clients. We know that some initial
prototypes following this approach have been
already implemented.
Another idea we find very interesting is the
incorporation of virtualization to the ZIVIS
middleware. This would allow to execute
applications on all architectures even in cases when
code migration is not feasible for technical or
licensing reasons.
References
[1] D. P. Anderson. BOINC: A System for Public-Resource
Computing and Storage. 5th IEEE/ACM International
Workshop on Grid Computing. November 8, 2004,
Pittsburgh, USA.
[2] Zaragoza City Council. http://www.zaragoza.es
[3] Institute for Biocomputation and Physics of Complex
Systems. http://bifi.unizar.es
[4] F. Castejón et al. “Ion kinetic transport in the presence of
collisions and electric field in TJ-II ECRH plasmas”,
Submitted to PPCF for publication.
[5] Guasp J and Liniers M 2000 Nuclear Fusion 40 397
[6] Kloeden P E and Pearson R A 1977 The numerical
solution of stochastic differential equations (J. Austral. Math.
Soc., Ser. B, 20, pp. 8-12)