Parallel implementations of three scientific applications using LB_migrate
ABSTRACT In this paper we focus on the implementation of large scientific applications with LB_Migrate, a dynamic load balancing library. The library employs dynamic loop scheduling techniques to address performance degradation factors due to load imbalance, provides a flexible interface with the native data structure of the application, and performs data migration. The library is reusable and it is not application specific. For initial testing, the library was employed in three applications: the profiling of an automatic quadrature routine, the simulation of a hybrid model for image denoising, and N-body simulations. We discuss the original applications without the library, the changes made to the applications to be able to interface with the library, and we present experimental results. Performance results indicate that the library adds minimal overhead, up to 6%, and it varies from application to application. However the benefits gained from the use of the library are substantial.
- [Show abstract] [Hide abstract]
ABSTRACT: This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems. Utilizing loop parallelism is clearly most crucial in achieving high system and program performance. This method achieves simultaneously the two most important objectives: load balancing and very low synchronization overhead. For certain types of loops the authors show analytically that guided self-scheduling uses minimal overhead and achieved optimal schedules. The authors discuss experimental results that clearly show the advantage of guided self-scheduling over the most widely known dynamic methods.IEEE Transactions on Computers 12/1987; 36:1425-1439. DOI:10.1109/TC.1987.5009495 · 1.47 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: HACS, a fourth-order Hermite integrator with the Ahmad-Cohen scheme is implemented. HACS is self-starting, so its implementation is considerably simpler than the original ACS. Compared to ACS, HACS allows time steps twice as long for the same accuracy and the increase in calculation cost per timestep is not very large. The actual gain in speed depends on the hardware and ranges between a factor of one and two. The gain using ACS would be significantly smaller on vector or parallel machines.Publications- Astronomical Society of Japan 03/1992; 44:141-151. · 2.01 Impact Factor
Article: The Chaco user`s guide. Version 1.0[Show abstract] [Hide abstract]
ABSTRACT: Graph partitioning is a fundamental problem in many scientific settings. This document describes the capabilities and operation of Chaco, a software package designed to partition graphs. Chaco allows for recursive application of any of several different methods for finding small edge separators in weighted graphs. These methods include inertial, spectral, Kernighan-Lin and multilevel methods in addition to several simpler strategies. Each of these methods can be used to partition the graph into two, four or eight pieces at each level of recursion. In addition, the Kernighan-Lin method can be used to improve partitions generated by any of the other methods. Brief descriptions of these methods are provided, along with references to relevant literature. The user interface, input/output formats and appropriate settings for a variety of code parameters are discussed in detail, and some suggestions on algorithm selection are offered.