arXiv:0804.4639v1 [astro-ph] 29 Apr 2008
Beowulf Analysis Symbolic INterface BASIN:
Interactive Parallel Data Analysis for Everyone∗
E. Vesperini1, D.M. Goldberg1, S. McMillan1, J. Dura2, D. Jones2
1Department of Physics, Drexel University, Philadelphia, PA
email@example.com, firstname.lastname@example.org, email@example.com
2Department of Computer Science, Drexel University, Philadelphia, PA
April 29, 2008
Submitted for publication to Computing in Science and Engineering
special issue on Computational Astrophysics
The advent of affordable parallel computers such as Beowulf PC clusters and, more
recently, of multi-core PCs has been highly beneficial for a large number of scientists
and smaller institutions that might not otherwise have access to substantial computing
facilities. However, there has not been an analogous progress in the development and
dissemination of parallel software: scientists need the expertise to develop parallel
codes and have to invest a significant amount of time in the development of tools even
for the most common data analysis tasks. We describe the Beowulf Analysis Symbolic
INterface (BASIN) a multi-user parallel data analysis and visualization framework.
BASIN is aimed at providing scientists with a suite of parallel libraries for astrophysical
data analysis along with general tools for data distribution and parallel operations on
distributed data to allow them to easily develop new parallel libraries for their specific
1 You have a Supercomputer. Are you a
The first “Beowulf” PC cluster  provided a proof of concept that sparked a revolution
in the use of off-the-shelf computers for numerical simulations, data mining, and statis-
tical analysis of large datasets. Because these clusters use commodity components, this
paradigm has provided an unbeatable ratio of computing power per dollar, and over the past
decade Beowulf-class systems have enabled a wide range of high-performance applications
on department-scale budgets.
Beowulfs have unquestionably been highly beneficial in our own field of Astrophysics, espe-
cially in smaller academic institutions that might not otherwise have access to substantial
computing facilities. Beowulfs have been successfully applied to numerical simulations that
use a variety of algorithmic approaches to study systems spanning the range of scales from
individual stars and supernovae (see e.g. ) to the horizon size of the universe (see e.g.).
Unfortunately, progress in the development of affordable parallel computers has not been
matched by analogous progress in the development and dissemination of parallel software
aimed at providing scientists with the tools needed to take advantage of the computing power
of parallel machines. Scientists with access to a Beowulf cluster still need the expertise to
develop parallel codes, and have to spend a tremendous amount of time in the development
and testing of tools, even for the most common data analysis tasks. This is in stark contrast to
the situation for serial data analysis and simulations, for which a large number of standard
general-purpose (e.g. Matlab, Maple, R/Splus, Mathematica, IDL) and specialized (e.g.
IRAF for astronomical data analysis) tools and libraries exist.
The recent advent of multi-core PCs has broadened still further the base of commodity
machines that enable computationally intensive parallel simulations and data analysis of
large datasets. However, this impressive technological advance serves also to make the lack
of general tools for parallel data analysis even more striking. The result is a significant
barrier to entry for users or developers of parallel computing applications.
As the gap between increasing parallel computing power and the availability of software
tools needed to exploit it has become more and more evident, a number of projects have
attempted to develop such tools. Our team has developed a package of parallel computational
tools—the Beowulf Analysis Symbolic INterface (BASIN)—to deal with precisely these
issues. BASIN is a suite of parallel computational tools for the management, analysis and
visualization of large datasets. It is designed with the idea that not all scientists need to
be specialists in numerics. Rather, a user should be able to interact with his or her data
in an intuitive way. In its current form, the package can be used either as a set of library
functions in a monolithic C++ program or interactively, from a Python shell, using the
BASIN Python interface.
BASIN is not the only package with this goal in mind. This magazine has recently presented
descriptions of Star-P  (a commercial package aimed at providing environments such as
Matlab, Maple and other with seamless parallel capabilities) and PyBSP  (a Python library
for the development of parallel programs following the bulk synchronous parallel model). As
an open-source project, BASIN growth is to be driven by the needs and contributions of users
and developers both in the astrophysical community and, possibly, in other computationally
study of the multi-scale physics of dense stellar systems. The result will be a single
environment in which a user can run novel numerical simulations, including a wide
variety of physical processes, and, within the same platform, perform parallel analysis
and visualization of the results, in real time or post-production.
We thank all the members of the BASIN team (B. Char, D. Cox, A. Dyszel, J. Haaga, M.
Hall, L. Kratz, S. Levy, P. MacNeice, E. Mamikonyan, M. Soloff, A. Tyler, M. Vogeley and
B. Whitlock) for many discussions and comments on the issues presented in this paper.
 Becker, D.J., Sterling, T.J., Savarese, D.F., Dorband, J.E., Ranawak, U.A., & Packer,
C.V., 1995, Proceedings of the International Conference on Parallel Processing
 Blondin, J.M., “Discovering new dynamics of core-collapse supernova shock waves”, 2005,
Journal of Physics: Conference Series 16 370
 Norman, M.L., Bryan, G.L., Harkness, R., Bordner, J., Reynolds, D., O’Shea, B. & Wag-
ner, R., “ Simulating Cosmological Evolution with Enzo” To appear in Petascale Computing:
Algorithms and Applications, Ed. D. Bader, CRC Press LLC (2007), arXiv:0705.1556v1
 Springel V. et al. 2005, “Simulations of the formation, evolution and clustering of galaxies
and quasars”, Nature 435 629
 S. Raghunathan, “Making a Supercomputer Do What You Want: High-Level Tools for
Parallel Programming”, Computing in Science and Engineering, vol. 8, no. 5, 2006, pp.70-80
 K. Hinsen, “Parallel Scripting with Python”, Computing in Science and Engineering, vol.
9, no. 6, 2007, pp.82-89
 Childs H., Brugger E., Bonnell K., Meredith J., Miller M., Whitlock B., Max N.: A con-
tract based system for large data visualization. In Proc. of Visualization 2005 Conference.
 D. York, et al., “The Sloan Digital Sky Survey: Technical Summary”, Astronomical Jour-
nal, vol. 120, 2000, p. 1579
 Carlson, B., El-Ghazawi, T., Numrich, R., Yelick K., 2003, “Programming in the Parti-
tioned Global Assdress Space Model”, https://upc-wiki.lbl.gov/UPC/images/b/b5/PGAS