ArticlePDF Available

WAMM (Wide Area Metacomputer Manager): A Visual Interface for Managing Metacomputers

Authors:

Abstract

Contents 1 Introduction 3 1.1 Metacomputer . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Metacomputing environments 4 3 Design goals 5 4 WAMM 9 4.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.4 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.5 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 WAMM's implementation issues 19 5.1 Structure of the program . . . . . . . . . . . . . . . . . . . . . 19 5.2 Host control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.3 Task control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.4 Remote commands . . . . . . . . . . . . . . . . . . . . . . . . 21 5.5 Remote compilation . . . . . . . . . . . . . . . . . . . . . . . . 22 6 F
WAMM (Wide Area Metacomputer Manager):
A Visual Interface for Managing Metacomputers
Version 1.0
Ranieri Baraglia, Gianluca Faieta, Marcello Formica,
Domenico Laforenza
CNUCE - Institute of the Italian National Research Council
Via S.Maria, 36 - I56100 Pisa, Italy
Tel. +39-50-593111 - Fax +39-50-904052
email: R.Baraglia@cnuce.cnr.it, D.Laforenza@cnuce.cnr.it
meta@calpar.cnuce.cnr.it
Contents
1 Introduction 3
1.1 Metacomputer . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Metacomputing environments 4
3 Design goals 5
4 WAMM 9
4.1 Conguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 WAMM's implementation issues 19
5.1 Structure of the program . . . . . . . . . . . . . . . . . . . . . 19
5.2 Host control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Task control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Remote commands . . . . . . . . . . . . . . . . . . . . . . . . 21
5.5 Remote compilation . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Future developments 23
7 Related works 24
References 25
1 INTRODUCTION 3
1 Introduction
Last years have seen a considerable increase in computer p erformance, mainly
as a result of faster hardware and more sophisticated software. Nevertheless,
there are still problems, in the elds of science and engineering, that are not
aordable using currently available sup ercomputers. Infact, these problems,
due to their size and complexity, require a computing p ower which is consid-
erably higher than currently available in a single machine. To deal with such
problems, several sup ercomputers need to be concentrated on the same site
to get the total p ower needed. This is obviously unfeasible both in terms of
logistics and economics.
For a few years in some imp ortant research centers (mostly in the USA)
tests have been made on the coop erative use, via network, of geographically
distributed computational resources. Several words have b een coined in con-
nection with this approach, such as Metacomputing [1], Heterogenous Com-
puting [3], Distributed Heterogeneous Supercomputing [2], Network Com-
puting, etc. One of the main reasons for introducing computer networks
was to allow researchers to carry out their work where they pleased, by giv-
ing them access to geographically distributed computing tools, both rapidly
and transparently. Technological advances and the increasing diusion of
networks (originally used for le transfer, electronic mail and then remote
login) now make it possible to achieve another interesting goal: to consider
multiple resources distributed over a network as a single computer, that is,
a
metacomputer
.
1.1 Metacomputer
A metacomputer is very dierent from a typical parallel MIMD-DM machine
(e.g. Thinking Machine CM-5, nCUBE2, IBM SP2). Generally, a MIMD
computer consists of
tightly coupled
processing no des of the same type, size
and power, whereas in a metacomputer the resources are
loosely coupled
and
heterogeneous. Each of these can eciently perform sp ecic tasks (calcula-
tion, storage, rendering, etc.), and, in these terms, each machine can execute
a suitable piece of an application. Thus, it is possible to exploit the
anity
existing between software modules and architectural classes. For example, in
a metacomputer containing a Connection Machine (CM-2) and an IBM SP2,
an application which is partitionable in two components | one
data parallel
,
WAMM Overview
2 METACOMPUTING ENVIRONMENTS 4
the other
coarse-grain task farm
| would naturally exploit the features of
both machines.
Metacomputing is now certainly feasible and could b e an economically
viable way to deal with some complex computational problems (not only
technical and scientic ones), as a valid alternative to extremely costly tra-
ditional supercomputers. Metacomputing is still at an early stage and more
research is necessary in several scientic and technological areas, for example:
1. metho dologies and tools for the analysis, parallelization and distribu-
tion of an application on a metacomputer;
2. algorithms for process-pro cessor allocation and load balancing in a het-
erogeneous environment;
3. user-friendly interfaces to manage and program metacomputers;
4. fault-tolerance and security of the metacomputing environments;
5. high p erformance networks.
2 Metacomputing environments
Developing metacomputing environments entails resolving several problems,
both hardware (networks, mass memories with parallel access, etc.) and
software (languages, development environments, resource management tools,
etc). Although many of the hardware problems are close to a solution, soft-
ware problems are still far from being resolved.
Currently available development environments generally have tools for
managing the resources of a metacomputer, but often do not have adequate
tools for designing and writing programs. Without such tools, software de-
sign cycle for metacomputers can come up against considerable diculties.
Typically, building an application for a metacomputer involves the following
steps:
1. user writes the source les on a local no de of the metacomputer;
2. source les are then transferred on every node;
3. the related compilation is made on all no des;
WAMM Overview
3 DESIGN GOALS 5
4. if errors are detected or modications made, then all the previous steps
are rep eated.
Compilation is needed on each node since it is not p ossible to determine a
priori on which machines the modules that make up the application will be
run. If the right tools are not available then the user has to manually transfer
source les and execute the corresponding compilation on all the no des; such
operations have to be repeated whenever even the smallest error needs to be
corrected. Therefore, even with just a few machines, a metho d to make these
operations automatic is essential.
Some metacomputing environments provide to ols to control certain as-
pects of conguration and management of the virtual machine, such as the
activation, insertion and removal of no des (e.g. the PVM console [11]). In
certain cases, however, easier to use management tools are needed, ab ove all
when large metacomputers with several nodes are b eing worked on.
To alleviate these problems, we have developed a graphical interface based
on OSF/Motif [21] and PVM, which simplies op erations normally carried
out to build and use metacomputer applications, as well as for managing
parallel virtual machines.
3 Design goals
This section describes the guidelines we followed in designing the interface.
We b elieve they are general enough to b e applicable to any development to ol
for metacomputing.
Easy of use of a metacomputer.
The main aim of the interface should b e
the simplication of the use of a metacomputer. This entails giving the user
an overall vision of the system, especially if there are many nodes spread out
over several sites. At the same time, the individual resources should b e easily
identiable. Although a simple list of the network addresses of the machines
would probably b e the fastest method to identify and access a particular
node, it would b e b etter to group machines by following some precise criteria
so as to facilitate user
exploration
of the resources available on the network.
In addition, the interface should let users work ab ove all on the local
node. Op erations that need to be carried out on remote no des should b e
executed automatically. Thus, developing software for metacomputers will
WAMM Overview
3 DESIGN GOALS 6
mainly require the use of the same tools used to write and set up sequential
programs (editor,
make
, etc.). This way, the impact with a new programming
environment will be less problematic.
Any simplications of the use of metacomputers cannot b e made if the
tools themselves are not easy to use and intuitive. It is well known that
Graphical User Interfaces (GUI) have gained the favor of computer users. We
therefore decided to develop our interface as an X11 program, thus allowing
users to access functionalities via windows, menus and icons. This requires
the use of graphical terminals, but it saves users from having to learn new
commands, keyboard shortcuts, etc.
System control.
When working with a metacomputer, especially if a low
level programming environment such as PVM is used, it may be dicult to
control operations that o ccur on remote no des: unexperienced users could
be discouraged. An interface for programming and using a metacomputer
should oer users as much information as possible, and full control on what
happens in the system. For example, users should never get into situations
where they do not know what is happening on a certain node. If problems
arise, these should be communicated with complete messages and not with
cryptic error codes. If the problem is so serious that the interface can no
longer b e used, then the program must exit tidily.
Virtual Machine management.
The interface must have a set of basic func-
tions to manage the virtual machine (addition/removal of nodes, control of
the state of one or more nodes, creation and shutdown of the virtual ma-
chine). Essentially, all the basic functions of the PVM console should be
implemented.
Process management.
Again, the functionalities to implement should b e, at
least, the ones that are provided by the PVM console. It must be p ossible
to spawn processes; the interface must allow the use of all the activation
parameters and ags that can be used in PVM. Users should be able to
redirect the output of the tasks towards the interface, so that they can control
the b ehaviour of the programs in \real time".
Remote commands.
When several machines are available, users often need
to open sessions on remote hosts or, at least, execute remote commands (e.g.
WAMM Overview
4 WAMM 7
uptime
or
xload
so as to know the machine load). Using UNIX commands
such as
rsh
to execute a program on a remote host is rather inconvenient, so
the interface should simplify this. Further simplication is needed for X11
programs. When an X11 program is run on a remote host, windows have to
be visualized on the local graphical terminal, which itself must be allowed to
accept them. The
xhost
command is used to p ermit this, but it should be
made automatic if X11 programs are run by the interface.
Remote compilation.
One of the most imp ortant functionalities of the in-
terface should be the ability to carry out the compilation of a program on
remote machines. Once the lo cal directory with the source codes has been
specied, along with the
Makefile
to use and the hosts where the program
has to b e compiled, the remainder should be completely managed by the
interface. This involves sending source les to remote hosts and starting
compilers. Any such op erations carried out by hand, apart from being time
expensive, are also error prone.
Remote compilation is quite complex. The user of the interface must be
able to follow the procedure step by step, and if necessary stop it at any
moment. For users to feel at ease with an automatic tool, all the operations
should be carried out tidily. For example, temporary old les, created on
le systems of remote machines as a result of previous compilation, must b e
deleted transparently.
Congurability.
The interface must be congurable so that it can be adapted
to any number of machines and sites. The conguration of the metacomputer
must therefore not be hard-co ded in the program, but specied by external
les.
To modify any graphical element on the interface (colours, window size,
fonts, etc.) resource les should be used. This is the standard technique for
all X11 programs, and does not require the interface to be re-compiled. Also,
using the X11 program
editres
, graphical elements can be modied without
having to write a resource le.
Finally, the program should not impose any constraints on the number of
nodes and networks in the system, nor on what type or where they are.
WAMM Overview
4 WAMM 8
Figure 1: WAMM
WAMM Overview
4 WAMM 9
4 WAMM
On the basis of the goals and criteria dened above, we have developed
WAMM (Wide Area Metacomputer Manager), an interface prototype for
metacomputing (g. 2). WAMM was written in C; in addition to PVM, the
OSF/Motif and xpm
1
libraries are required. This section gives a general
overview of the interface.
4.1 Conguration
To use WAMM, users have to write a conguration le which contains the
description of the nodes that can be inserted into the virtual machine, i.e.
all the machines that users can access and which have PVM installed. This
operation only has to be done the rst time that WAMM is used.
The conguration le is written in a simple declarative language. An
excerpt of a conguration le is shown in the following:
WAN italy {
TITLE "WAN Italy"
PICT italy.xpm
MAN cineca 290 190
MAN pisa 210 280
LAN caspur 300 430 }
MAN cineca {
TITLE "Cineca"
PICT cineca.xpm
LAN cinsp1 220 370
LAN cinsp2 220 400
LAN cinsp3 220 430
LAN cinsp4 220 460 }
...
MAN pisa {
TITLE "MAN Pisa"
PICT pisa.xpm
1
xpm
is a freely distributable library which simplies the use of pixmaps in X11; it is
available via anonymous FTP at
avahi.inria.fr
WAMM Overview
4 WAMM 10
LAN cnuce 200 100
LAN sns 280 55 }
LAN caspur {
TITLE "Caspur"
HOST caspur01
HOST caspur02
...
HOST caspur08 }
...
HOST cibs {
ADDRESS cibs.sns.it
PICT IBM41T.xpm
ARCH RS6K
OPTIONS "&"
XCMD "Xterm" "xterm -sb"
CMD "Uptime" "uptime"
CMD "Who" "who"
}
...
The le describes the geographical network used, named
italy
. The net-
work consists of some MANs and a LAN. For example,
pisa
MAN includes
the local networks
cnuce
and
sns
;
sns
LAN contains various workstations,
among which
cibs
.
As can b e seen, the network is described following a tree-like structure.
The root is the WAN, the geographical network that groups together all the
hosts. The children are Metropolitan (MAN) and Local (LAN) networks. A
MAN can only contain local networks, whereas LANs contain only the hosts,
the leaves of the tree.
Various items can be specied for each declared structure, many of which
are optional and have default values. Each no de on the tree (network or host)
can have a PICT item, which is used to associate a picture to the structure.
Typically, geographical maps are used for networks, indicating where the
resources are; icons representing the architecture are used for the hosts.
WAMM Overview
4 WAMM 11
The following is an example of a \rich" description of a host:
HOST cibs {
ADDRESS cibs.sns.it # host internet address
PICT IBM500.xpm # icon
ARCH RS6K # architecture type
INFO "RISC 6000 model 580" # other information
OPTIONS "& so=pw" # PVM options
XCMD "AIXTerm" "aixterm -sb"
XCMD "NEdit" "nedit" # remote commands
XCMD "XLoad" "xload"
CMD "Uptime" "uptime"
}
In this case the host is an
IBM RISC 6000
workstation; the type of architec-
ture is
RS6K
(the same names as adopted by PVM are used). To insert the
node into the PVM, sp ecial ags have to be used (
& so=pw
are PVM's own
options). The user can execute
aixterm
,
nedit
,
xload
and
uptime
com-
mands directly from the interface on the remote node. For example, using
aixterm
he can connect himself directly to the machine. The
aixterm
pro-
gram runs on remote node, but the user receives the window on the local
terminal and can use it to insert UNIX commands.
4.2 Activation
User starts the interface with the command:
wamm <configuration file>
PVM does not need to have b een already activated. If the virtual machine
does not exist at this point, WAMM creates it on the basis of the contents
of the conguration le. The \base" window corresponding to the WAN is
shown to the user (g. 2).
4.3 Windows
WAMM visualizes information relating to the networks (at WAN, MAN or
LAN level) in separate windows, one for each network. Hosts are shown
inside the window of the LAN they belong to.
WAMM Overview
4 WAMM 12
Figure 2: WAMM, example of initial WAN window
WAMM Overview
4 WAMM 13
The WAN window is split into three parts (g. 2). At top left is the map
indicated in the conguration le. There is a button for each sub-network;
the user can select them to open corresponding windows.
All the hosts declared in the conguration le are listed on the right. The
list has various uses: the user can access a host quickly by double clicking on
the name of the machine, without having to navigate through the various sub-
networks. By selecting a group of hosts, various operations can be invoked
by the menu:
insert hosts into PVM;
remove hosts from PVM;
check hosts' status;
compile on selected hosts.
All the messages produced by WAMM are shown at the bottom. Fig-
ure 2 shows information written when the program was started. MAN
sub-networks are shown using the same type of window (g 3). The only
dierence is in the list of hosts, which, in this case, only includes no des that
belong to the MAN.
For the local networks, the windows are organized dierently (see g. 4).
The window reproduces a segment of Ethernet with the related hosts. For
each host the following are shown: the icon, the current status (
PVM
means
that the node belongs to the virtual machine), the type of architecture and
other information specied in the conguration le. Each icon has a p opup
menu associated with it, which can be activated using the right mouse button.
This menu enables users to change the status of the node (add or remove from
PVM), run a compilation or execute one of the remote commands indicated
in the conguration le. Basic operations on groups of hosts can still be
carried out by selecting one or more nodes and invoking the operation from
the window menu. In all cases, the results appear in the message area at the
bottom of the window.
4.4 Compilation
The compilation of a program is mostly managed by WAMM: the user only
has to select hosts where he wants to do the compilation, and call
Make
WAMM Overview
4 WAMM 14
Figure 3: WAMM, a MAN window representing Pisa
from the
Apps
menu (g. 2). Using a dialog b ox, the local directory that
contains the source les and the
Makefile
can be specied, along with any
parameters needed for the
make
command. No restrictions are made on the
type of source les to compile: they can be written in any language.
WAMM carries out some op erations needed to compile an application. In
the following order:
1. all the source les are group ed into one le, in the
tar
standard format
used on UNIX machines;
2. the le produced is compressed using the
compress
command;
3. a PVM task, which deals with the compilation (
PVMMaker
), is spawned
on each selected no de; the compressed le is sent to all these tasks.
WAMM Overview
4 WAMM 15
Figure 4: WAMM, LAN window
Now WAMM's work ends: the remaining op erations are carried out at the
same time by all
PVMMaker
s, each on its node. Each
PVMMaker
performs the
following actions:
1. it creates a temp orary work directory, inside user's home directory;
2. compressed le is received, expanded and saved in the new directory;
source les are extracted;
3. the UNIX
make
command is executed.
At the end of the compilation the working directory is not destroyed (but it
will be if there is a subsequent compilation). If needed, the user can thus
WAMM Overview
4 WAMM 16
Figure 5: WAMM, control window for the compilation
connect to the host, modify the source code if necessary, and manually start
a new compilation on the same les.
Each
PVMMaker
noties WAMM of all the op erations that have been ex-
ecuted. Messages received from
PVMMaker
s are shown in a control window,
to let users check how the compilation is going. Figure 5 depicts a sample
compilation run on seven machines. For each machine the step that was
currently being made when the image was \xed" can b e seen. For example,
astro.sns.it
has received its own copy of the directory and is expanding
it, while
calpar
has successfully completed the compilation.
By selecting one or more hosts in the control window, output messages
can b e seen b efore the compilation completes, along with any errors caused
by
make
and by the compiler. A
make
can be stopp ed at any moment with
a menu command. If a node fails (for example due to errors in the source
code), this do es not aect the other nodes.
If the compilation is successful, the same
Makefile
that was used to carry
out it can copy the executable les produced into the directory
$PVM_ROOT/bin/$PVM_ARCH
used by PVM as a \storage" for executable les. This op eration can be
carried out on each node.
WAMM Overview
4 WAMM 17
Figure 6: WAMM, PVM pro cess spawning
4.5 Tasks
WAMM allows PVM tasks to b e spawned and controlled. Programs are
executed by selecting
Spawn
from menu
Apps
(g. 2). A dialog box is opened
where some parameters used by PVM for requesting the execution of the
tasks can be inserted (g. 6). The following can be specied:
the name of the program;
any command-line arguments to pass to the program;
the number of copies to execute;
the mapping scheme: by specifying
Host
all the tasks are activated on
one machine (whose address has to be indicated); by specifying
Arch
,
WAMM Overview
4 WAMM 18
Figure 7: WAMM, control window for PVM tasks
PVM chooses only those machines with a user-selected architecture;
nally,
Auto
can be used to let the PVM choose the nodes on which
copies have to be spawned;
PVM's various ags (
Debug
,
Trace
,
MPP
);
any redirection of the output of the program to WAMM (ag
Output
);
the events to record if the Trace option is enabled.
By selecting
Spawn
, the new tasks are run. The
Windows
menu (g. 2) can b e
used to open a control window that contains some status information on all
the PVM tasks b eing executed in the virtual machine (g. 7). Data on tasks
are automatically updated: if a task terminates, its status is changed. New
tasks that appear in the system are added to the list, even if they were not
spawned by WAMM. Output of processes activated with the
Output
option
can b e seen in separate windows and can b e also saved to a le. If the output
windows are open, new messages from the tasks are shown immediately.
Kill
(task destruction) and
Signal
(sending sp ecic signal) are possible
for all PVM tasks, including those not spawned by WAMM.
WAMM Overview
5 WAMM'S IMPLEMENTATION ISSUES 19
5 WAMM's implementation issues
This section outlines the most important aspects of the implementation.
Some reference is made to concepts and functionalities of the UNIX op erating
system and the PVM environment; see [20] and [11] for further details.
5.1 Structure of the program
Each complex function of WAMM is implemented by an independent module;
the modules are then linked during the compilation. This type of structure is
useful for all complex programs and facilitates modications to the code and
the insertion of new functionalities. The set of mo dules can be sub divided
into three levels:
Application modules.
These are high level mo dules which implement task
spawning and control, as well as source codes compilation.
Graphic modules.
These include all the functions needed to create the
graphic interface of the program.
Network modules.
These are control modules which act as an interface
between the application and the underlying virtual machine.
The program is totally event-driven. Once the initialization of the internal
modules and the related data structures is complete, the program stops and
waits for messages from the PVM environment or from the user (for example,
the termination of an active task or the selection of a button in a window).
This is a typical X11 program b ehaviour.
5.2 Host control
During initialization WAMM enrolls PVM so that all the control functions
of the virtual machine oered by the environment can be exploited. Sp ecif-
ically, the insertion and removal of hosts is controlled using the function
pvm_notify
. WAMM is informed of any changes in the metacomputer con-
guration and shows them to the user. The notication mechanism is also
able to recognize any variations pro duced by external programs. For exam-
ple, if hosts are added or removed using the PVM console, the modication
is detected by WAMM as well.
WAMM Overview
5 WAMM'S IMPLEMENTATION ISSUES 20
5.3 Task control
Unfortunately, PVM's notication mechanisms for tasks is not as complete
as that for hosts: by using
pvm_notify
it is p ossible to nd out when a given
task terminates, but not when a new task app ears in the system. To get
complete control of tasks to o, WAMM has to use satellite processes, named
PVMTasker
s. During initialization phase a
PVMTasker
is spawned on each
node in the virtual machine. Each
PVMTasker
periodically queries its own
PVM daemon to get the list of tasks running on the node. When variations
from the previous control are found, WAMM is sent the relevant information.
Using
PVMTasker
processes is just one way to emulate a more complete
pvm_notify
. One drawback is that
PVMTasker
s have to b e installed on each
node indicated in the conguration le. An alternative is to let the interface
itself request, from time to time, the complete list of tasks (PVM's function
pvm_tasks
can b e used to do this). This metho d do es not need satellites but
does have some drawbacks; specically:
data from all the daemons have to be transmitted to WAMM, even if
there have not b een variations in the number of tasks, compared to
the previous control. In the rst solution messages are only sent when
necessary;
if one of the no des fails, then
pvm_tasks
waits until a timeout is
reached. Some minutes may pass before the function resumes with
the next nodes and sends the list of tasks to WAMM. The rst solu-
tion, which is based on independent tasks, does not have this problem:
if a node fails, then only its own tasks will not be up dated.
There is a third solution, which exploits PVM's concept of
tasker
2
. A
tasker
is a PVM program enabled to receive the control messages which,
in the virtual machine, are normally used to request the activation of a
new process. This basically means that if a
tasker
process is active on a
node, the lo cal daemon does not activate the program, but passes the request
to the
tasker
. The
tasker
executes it, and when the activated pro cess has
terminated, it informs the daemon. We could write a
tasker
in such a way
that not only it deals with the daemon, but it also noties the interface of
the activation and the termination of its own tasks. This solution is the
2
Taskers
, along with
Hosters
, were introduced with version 3.3 of PVM.
WAMM Overview
5 WAMM'S IMPLEMENTATION ISSUES 21
least expensive in terms of communications (p erio dic messages between the
control task and the daemon of its node are eliminated too), but it is not
without drawbacks:
a program still has to be installed on each node;
no information are available on PVM pro cesses created b efore the
taskers
are registered.
When developing WAMM we tried out all three solutions. We opted for
the rst as it was by far the b est both in terms of network usage and control
capabilities.
5.4 Remote commands
PVMTasker
processes described above are also used to execute programs on
remote hosts: the satellite task receives the name of the program along with
command line arguments and executes a
fork
. Child pro cess executes the
program; output is sent to the interface. This solution has a main drawback:
it is imp ossible to execute commands on hosts were
PVMTasker
is not running.
The classical alternative consists in using commands such as
rsh
or
rexec
.
These can be used for any host, even if it is not in PVM. For example, to
nd out the load of the node
evans.cnuce.cnr.it
, a user connected to
calpar.cnuce.cnr.it
can write:
rsh evans.cnuce.cnr.it uptime
The
uptime
command is executed on
evans
and the output is shown on
calpar
. The no des do not have to belong to the virtual machine (nor, in
fact, does PVM have to be installed). The problem arises from the fact that
rsh
and
rexec
can b e considered alternatives:
to use
rsh
the user has to give remote hosts permission to accept the
execution requests, by creating an
.rhosts
le on each no de used;
to use
rexec
no
.rhosts
les are needed, but unlike
rsh
, the password
of the account on the remote host is requested.
Neither metho d is really satisfactory:
.rhosts
les create security prob-
lems and are often avoided by system administrators; the request for pass-
word is not acceptable when there are many accounts or commands to deal
WAMM Overview
5 WAMM'S IMPLEMENTATION ISSUES 22
with. PVM therefore allows b oth metho ds to be used: the user species in
the hostle, for each no de, what should be used (the same options are ad-
mitted in the conguration le used by WAMM). This information is needed
since PVM has to activate the
pvmd
daemon on all the remote no des that
are inserted in the virtual machine, by using either
rsh
or
rexec
.
The alternative metho d to execute a remote command would entail exam-
ining the conguration le to establish whether
rsh
or
rexec
is required for
each node. In any case it is easier to run the command from the
PVMTasker
on
the remote node. With resp ect to the
PVMTasker
, the command is executed
locally, so neither
rsh
nor
rexec
are used.
5.5 Remote compilation
As described in the previous section, WAMM can compile programs on re-
mote hosts by using a PVM task called
PVMMaker
.
PVMMaker
s are spawned,
on each no de required, only at compilation time and terminate immediately
after the conclusion of the
make
process. The compressed directory and the
command line arguments for
make
are sent to
PVMMaker
s. Upon receipt of
these data, each
PVMMaker
expands the directory, activates the compilation
on its own node and sends messages back to the interface about the current
activity. Output messages produced by the compilers are also sent to the
interface, under the form of normal PVM messages.
Much of what was said for the execution of remote commands is also
applicable to the compilation. To transfer a le onto a host the UNIX com-
mand
rcp
can be used, but it requires, like
rsh
, a suitable
.rhosts
le on
the destination node. The only valid alternative is to use the
ftp
protocol
to transfer les, but managing it is considerably more complex than simply
transferring data between tasks via PVM primitives.
The execution of commands needed for the compilation could b e accom-
plished by using
rsh
or
rexec
too. However, not only would this lead to the
problems described ab ove, but also it would not b e as ecient as the solution
based on the
PVMMaker
s | the interface would have to manage the results
of
all
the op erations on
all
the hosts. By using the
PVMMaker
, the interface
only has to spawn the tasks, send them the directory with the source les
and show the user the messages that come back from the various
PVMMaker
s.
The compilation is carried out in parallel on all the nodes.
The disadvantages of using
PVMMaker
s are similar to the ones describ ed
WAMM Overview
6 FUTURE DEVELOPMENTS 23
for the
PVMTasker
s: a
PVMMaker
has to b e installed on each node that the
user has access to and compilations cannot be made on a node that is not
part of the virtual machine (the
PVMMaker
could not be activated). To resolve
the rst point, the functionalities of the
PVMTasker
and the
PVMMaker
could
perhaps be brought together into one task, thus simplifying the installation
of WAMM. However, having two separate tasks oers greater modularity.
6 Future developments
The current version of WAMM is only the starting point to build a complete
metacomputing environment, based on PVM. Some implementation choices,
such as subdividing mo dules with a similar structure, were made in order to
simplify as far as possible the insertion of new features. Possible new features
are shown in the following.
Resource management.
The management of the resources of a metacom-
puter (nodes, networks, etc.) is one of the most imp ortant aspects of each
metacomputing environment. WAMM currently let users work only with
the simple mechanisms implemented in PVM to control the nodes and task
execution. For example, to spawn a task, the user can only choose whether
to use a particular host, a class of architectures or any node identied by
PVM on a round-robin basis. For complex applications, more sophisticated
mapping algorithms are needed. Such algorithms could be implemented in
the module that activates the tasks.
Performance analysis.
The current version of WAMM allows the activation
of a task in trace mo de: whenever the task calls a PVM function, a trace
message is sent to the interface. By appropriately recording and organizing
these data, all the information needed to study program p erformance can
be obtained. Specically, times sp ent on calculation, communications and
various overheads can be determined. At the moment WAMM do es not
make any use of the trace messages it receives; a future version should have
a module which can collect these data, show them to the user either as
graphics or tables, and save them in a \standard" format, which can b e used
in subsequent examinations via external to ols, such as ParaGraph [15].
WAMM Overview
7 RELATED WORKS 24
Remote commands.
The p ossibility to execute remote commands could be
exploited to run the same op eration on several hosts simultaneously. This
type of functionality, not implemented yet, would allow some problems re-
garding the development and maintenance of PVM programs to be resolved.
For example, with one command an old PVM executable le could be deleted
from a group of no des. Manual deletion is not feasible if there are many
nodes, it would therefore b e advantageous to use the interface.
7 Related works
WAMM's features can b e divided in two groups: rst, it provides users with
a set of facilities to control and congure the metacomputer; on the other
side, it can b e also considered as a software development to ol. There are
some other packages which oer similar functionalities.
XPVM is a graphical console for PVM with supp ort for virtual machine
and process management. The user can change metacomputer conguration
(by adding or removing nodes) and spawn tasks, in a way similar to WAMM's.
With respect to WAMM, XPVM does not provide the same \geographical"
view of the virtual machine and is probably more suitable for smaller systems.
XPVM has no facilities for source le distribution, parallel compilation and
execution of commands on remote nodes; anyway, it includes a section for
trace data analysis and visualization, not implemented in WAMM yet.
HeNCE [13, 14] is a PVM based metacomputing environment which
greatly simplies software development cycle. In particular, it implements
a system for source le distribution and compilation on remote no des, sim-
ilar to that used by WAMM: source les can be compiled in parallel on
several machines and this task is controlled by processes comparable with
PVMTasker
s. HeNCE lacks all the virtual machine management facilities
provided by WAMM; for these, it is often necessary to use PVM console. It
should be said that HeNCE was designed with dierent goals: the simpli-
cation of application development is mostly achieved by using a dierent
programming mo del, with communication abstraction, rather than providing
remote compilation facilities.
WAMM Overview
REFERENCES 25
References
[1] L. Smarr, C.E. Catlett. Metacomputing. Communications of the ACM, June
1992, Vol. 35, No. 6 (45- 52).
[2] R. Freund, D. Conwell. Superconcurrency: A Form of Distributed Heteroge-
neous Supercomputing. Supercomputing Review, Oct. 1990, pp. 47-50.
[3] A.A. Khokhar, V.K. Prasanna, M.E. Shaaban, C. Wang. Heterogeneous Com-
puting: Challenges and Opp ortunities. Computer - IEEE (18-27), June 1993.
[4] P. Messina. Parallel and Distributed Computing at Caltech. Technical Re-
port CCSF-10-91, Caltech Concurrent Supercomputing Facilities, California
Institute of Technology, USA, October 1991.
[5] P. Messina. Parallel Computing in USA. Technical Report CCSF-11-91,
Caltech Concurrent Sup ercomputing Facilities, California Institute of Tech-
nology, USA, Octob er 1991.
[6] P. Huish (Editor). Europ ean Meta Computing Utilising Integrated Broadband
Communication. EU Project Number B2010, Technical Rep ort.
[7] P. Arb enz, H.P. Luthi, J.E. Mertz, W. Scott. Applied Distributed Super-
computing in Homogeneous Networks. IPS Research Rep ort N.91-18, ETH
Zurich.
[8] V.S. Sunderam. PVM: a Framework for Parallel Distributed Computing.
Concurrency: Practice and Experience, 2(4):315{339, December 1990.
[9] G. A. Geist, V. S. Sunderam. Network-Based Concurrent Computing on the
PVM System. Concurrency: Practice and Exp erience - Vol. 4(4) - July 1992.
[10] J. J. Dongarra, G. A. Geist, R. Manchek, V.S. Sunderam. The PVM Con-
current System: Evolution, Experiences and Trends. Parallel Computing
20(1994) 531-545.
[11] A. L. Beguelin, J. J. Dongarra, G. A. Geist, R. Mancheck, V. S. Sunderam,
and W. Jiang. PVM3 users' guide and reference manual. Technical Report
ORNL/TM-12187, Oak Ridge National Lab, May 1994.
[12] R. Baraglia, G. Bartoli, D. Laforenza, A. Mei. Network Computing:
denizione e uso di una propria macchina virtuale parallela mediante PVM.
CNUCE, Rapporto Interno C94-05, Gennaio 1994.
[13] A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, V.S. Sunderam. Graph-
ical Development Tools for Network-Based Concurrent Supercomputing. Pro-
ceedings of Sup ercomputing 91, Albuquerque 1991.
WAMM Overview
REFERENCES 26
[14] A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, K. Moore, R. Wade,
J. Plank, V.S. Sunderam. HeNCE: A Users' Guide, Version 2.0.
[15] M. T. Heath and J. E. Finger. ParaGraph: a tool for visualizing performance
of parallel programs. Oak Ridge National Lab, Oak Ridge, TN, 1994.
[16] G. Bertin, M. Stiavelli. Reports on Progress in Physics, 56, 493, 1993.
[17] S. Aarseth. Multiple Timescales. Ed. J.U. Brackbill & B.I. Cohen, p.377,
Orlando: Academic Press, 1985.
[18] L. Hernquist. Computer Physics Communications, 48, 107, 1988.
[19] H. E. Bal, J. G. Steiner, A. S. Tanenbaum. Programming Languaues for
Distributed Computing Systems. ACM Computing Surveys, Vol. 21, No. 3,
September 1989.
[20] H. Hahn. A Student's Guide to UNIX. Mc Graw-Hill, Inc., 1993.
[21] M. Brain. Motif Programming: The Essentials . . . and More. Digital Press,
1992.
WAMM Overview
... WAMM (Wide Area Metacomputing Manager) [2, 3] is a graphic interface based on OSF/Motif and PVM [4, 5, 6], developed by the Parallel Processing Research Group at CNUCE, Pisa, in 1995 1 . WAMM is the first step towards the development of a complete tool for the management of a metacomputer [1] based on PVM. ...
Chapter
Full-text available
Metacomputing is one of the most interesting evolutions of Parallel Processing. A complete environment for metacomputing should have tools for monitoring applications that can gather information both on the applications being executed and on the processors that they are executed on. Such data can be used to manage statistics, for debugging, and for tuning meta-applications. This paper describes an integration between WAMM, a visual interface for the configuration and management of a metacomputer, and PVaniM, a system that provides support for displaying the behaviour of PVM applications.
Chapter
The impact of networking and hypermedia on chemical research and education is discussed by outlining the main techonological features and giving some examples.
Article
In this paper, we propose a technology for automatic adaptation of a common computer program to different host architectures. The technology is implemented by metamake program.
Conference Paper
In this paper we describe the implementation of a parallel code to study the n-body problem of non-destructive evolution processes inside a cluster of galaxies. The code has been implemented and optimized for a metacomputer structured as a workstation cluster which can be set up by computers located either in the same site or in geographically distributed sites. The use of a metacomputer is very important for carrying out complex simulations. Performance results obtained by executing several tests on homogeneous and heterogeneous clusters of workstations and on a metacomputer made up of four IBM SP2s located in Italy are given.
Conference Paper
This paper presents the extensions made to WAMM (Wide Area Metacomputer Manager) in order to manage a metacomputer composed of hosts on which several message-passing programming environments run. Initially, WAMM only permitted the execution of meta-applications consisting of PVM components. To allow the execution of meta-applications composed of PVM and MPI tasks, WAMM has been extended by means of PLUS. PLUS is one of the main components of the MOL Project. It provides an efficient, easy-to-use interface among various communication libraries. The paper is organized as follows: after the introduction, Sections 2 and 3 describe WAMM and PLUS respectively. Section 4 presents some implementation details related to the new WAMM functionalities, and Section 5 describes some performance measurements related to the new version of WAMM. Section 6 concludes the paper.
Article
Current environments for metacomputing generally have tools for managing the resources of a metacomputer but often lack adequate tools for designing, writing, and executing programs. Building an application for a metacomputer typically involves writing source codes on a local node, transferring and compiling codes on every node, and starting their execution. Without such tools, the application development phases can come up against considerable difficulties. In order to alleviate these problems, some graphical user interfaces (GUIs) based on PVM, such as XPVM, Parallel Application Development Environment (PADE) and Wide Area Metacomputing Manager (WAMM) have been implemented. These GUIs integrate a programming environment which facilitates the user in performing the application development phases and the application execution.This paper outlines the general requirements for designing GUIs for metacomputing management, and compares WAMM, a graphical user interface, with some related works.
Article
The need to deal with highly complex scientific problems has led some researchers to use geographically distributed computing resources as a single powerful parallel machine. The word metacomputing has been coined to describe this new approach. This paper outlines some experiences in metacomputing carried out on a wide area network. A brief overview is given of the main issues concerning metacomputing, and some experiments in this field are reported. WAMM, the metacomputer visual interface we have developed, is described. Astrophysics is a particularly fruitful field for numerical-intensive computer simulation, because the systems under study, stars and galaxies, are not amenable to controlled laboratory experiments. An astrophysical application of the gravitational n-body problem, developed in conjunction with the Scuola Normale Superiore of Pisa, was used to test WAMM's utility in metacomputer management, and some performance figures related to the case study are given. Work related to WAMM and conclusions are also presented. © 1997 by John Wiley & Sons, Ltd.
Conference Paper
In this contribution an overview of the Hypercomputing project is given. The objective of this project is to employ unused computer resources, which are distributed over wide areas. A Hypercomputer represents a system of workstations and/or workstation clusters, which are integrated as a homogenous resource both for the solution of sequential as well as of parallel problems. German-wide Universities, research institutes and industrial partners are taking part in the project. Beside giving an explanation of the goals of this project, the status of the development is described. Central problem areas and the chosen attempts for the realization of the Hypercomputer prototype are demonstrated and some experiments and first experiences are explained
Conference Paper
Full-text available
Distributed high-performance computing (so-called metacomputing) refers to the coordinated use of a pool of geographically distributed high-performance computers. The user's view of an ideal metacomputer is that of a powerful monolithic virtual machine. The implementor's view, on the other hand, is that of a variety of interacting services implemented in a scalable and extensible manner. We present MOL, the Metacomputer Online environment. In contrast to other metacomputing environments, MOL is not based on specific programming models or tools. It has rather been designed as an open, extensible software system comprising a variety of software modules, each of them specialized in serving one specific task such as resource scheduling, job control, task communication, task migration, user interface, and much more. All of these modules exist and are working. The main challenge in the design of MOL lies in the specification of suitable, generic interfaces for the effective interaction between the modules
Chapter
This chapter discusses the industrial opportunities for parallel computers concerns the type of applications. The industrial applications show that simulation is not the largest market in the long run. Rather, living in the information area and it is in the processing of information that parallel computing will have its largest opportunity. This is not transaction processing for the galaxy wide network of automatic teller machines; rather, it is the storage and access of information followed by major processing. Examples include the interpretation of data from NASA's mission to planet earth where the processing is large-scale image analysis; the scanning and correlation of technical and electronic information from the world's media to give early warning for economic and social crises; the integration of medicaid databases to lower the burden on doctors and patients and identify inefficiencies.
Article
The hierarchical tree method offers the possibility of computing the interaction between N self-gravitating particles in a time ≈(N log N), without using a grid. The potential advantages of this technique for collisionless systems and for particle hydrodynamic schemes are discussed.
Article
The PVM system, a software framework for heterogeneous concurrent computing in networked environments, has evolved in the past several years into a viable technology for distributed and parallel processing in a variety of disciplines. PVM supports a straightforward but functionally complete message passing model, and is capable of harnessing the combined resources of typically heterogeneous networked computing platforms to deliver high levels of performance and functionality. In this paper, we describe the architecture of PVM system, and discuss its computing model, the programming interface it supports, auxiliary facilities for process groups and MPP support, and some of the internal implementation techniques employed. Performance issues, dealing primarily with communication overheads, are analyzed, and recent findings as well as experimental enhancements are presented. In order to demonstrate the viability of PVM for large scale scientific supercomputing, the paper includes representative case studies in materials science, environmental science, and climate modeling. We conclude with a discussion of related projects and future directions, and comment on near and long-term potential for network computing with the PVM system.
Article
In this paper we report on our first experience with a portable, easily usable communication environment, 'Sciddle', for distributing computations over a homogeneous network of UNIX computers. We demonstrate the usefulness of the system with applications from linear algebra and quantum chemistry on a network of Ethernet-connected Sun workstations and of Internet-connected Cray supercomputers, respectively.
Article
The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or more networks. The participating processors may be scalar machines, multiprocessors, or special-purpose computers, enabling application components to execute on the architecture most appropriate to the algorithm. PVM provides a straightforward and general inferface that permits the description of various types of algorithms (and their interactions), while the underlying infrastructure permits the execution of applications on a virtual computing environment that supports multiple parallel computation models. PVM contains facilities for concurrent, sequential or conditional execution of application components, is portable to a variety of architectures, and supports certain forms of error detection and recovery.
Article
Concurrent computing environments based on loosely coupled networks have proven effective as resources for multiprocessing. Experiences with and enhancements to PVM (Parallel Virtual Machine) are described in this paper. PVM is a software system that allows the utilization of a heterogeneous net- work of parallel and serial computers as a single computational resource. This report also describes an interactive graphical interface to PVM, and porting and performance results from production applications.
Article
When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on message passing, rendezvous, remote procedure call, objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages.