[Show abstract][Hide abstract] ABSTRACT: N-body algorithms for long-range unscreened interactions like gravity belong to a class of highly irregular problems whose optimal solution is a challenging task for present-day massively parallel computers. In this paper we describe a strategy for optimal memory and work distribution which we have applied to our parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a Cray T3D using the CRAFT programming environment. We have performed a series of tests to find an optimal data distribution in the T3D memory, and to identify a strategy for the Dynamic Load Balance in order to obtain good performances when running large simulations (more than 10 million particles). The results of tests show that the step duration depends on two main factors: the data locality and the T3D network contention. Increasing data locality we are able to minimize the step duration if the closest bodies (direct interaction) tend to be located in the same PE local memory (contiguous block subdivision, high granularity), whereas the tree properties have a fine grain distributuion. In a very large simulation, due to network contention, an unbalanced load arises. To remedy this we have devised an automatic work redistribution mechanism which provided a good Dynamic Load Balance at the price of an insignificant overhead.
[Show abstract][Hide abstract] ABSTRACT: Particle codes which simulate the evolution of self-gravitating systems are a popular tool of contemporary cosmological research. Here we compare two different approach to the parallelization of Tree N-body codes, namely work-sharing and message passing. We also discuss a Dynamical Load Balancing algorithm which we have applied to the work-sharing code. preliminary results show merits and problems of these two approaches.
Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science, Second International Workshop, PARA '95, Lyngby, Denmark, August 21-24, 1995, Proceedings; 01/1995
[Show abstract][Hide abstract] ABSTRACT: We describe a new parallel implementation of the octal-hierarchical tree N-body algorithm on SHared Memory systems (SHM) we
have recently developed. Pursuing an effort to optimize as much as possible the code on a generic SHM we have modified the
original algorithm introduced by Barnes and Hut, introducing a new scheme of “grouping” of particle interactions. We present
speedup and efficiency results.
Parallel Computation, 4th International ACPC Conference Including Special Tracks on Parallel Numerics (ParNum'99) and Parallel Computing in Image Processing, Video Processing, and Multimedia, Salzburg, Austria, February 1999, Proceedings; 01/1999
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.