Chapter

Optimal distribution of loops containing no dependence cycles

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In the paper, a method is proposed for optimizing parallelization of loops whose dependence graphs are acyclic. The idea of loop distribution as a method for data dependence synchronization is presented. Next, the loop distribution optimization problem is posed and solved with the technique of optimal dependence folding. Theoretical considerations are accompanied by experimental results from applying the proposed optimization method to an example program executed on a distributed-memory parallel supercomputer.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Article
Many approaches have been described for the parallel loop scheduling problem for shared-memory systems, but little work has been done on the data-dependent loop scheduling problem (nested loops with loop carried dependencies). In this paper, we propose a general model for the data-dependent loop scheduling problem on distributed as well as shared memory systems. In order to achieve load balancing and low runtime scheduling and communication overhead, our model is based on a loop task graph and the notion of critical path. In addition, we develop a heuristic algorithm based on our model and on genetic algorithms to test the reliability of the model. We test our approach on different scenarios and benchmarks. The results are very encouraging and suggest a future parallel compiler implementation based on our model.
Article
Full-text available
In this paper, we propose different approaches for the parallel loop scheduling problem on distributed as well as shared memory systems. Specifically, we propose adaptive loop scheduling models in order to achieve load balancing, low runtime scheduling, low synchronization overhead and low communication overhead. Our models are based on an adaptive determination of the chunk size and an exploitation of the processor affinity property, and consider different situations (central or local queues, and dynamic or static loop partition).
Article
Full-text available
In this paper, we propose a general model for the data dependent loop scheduling problem on distributed as well as shared memory systems. In order to achieve load balancing and low runtime scheduling and communication overhead, our model is based on a loop task graph and the notion of critical path. In addition, we develop a heuristic algorithm based on our model and on genetic algorithms to test the reliability of the model. We test our approach on different scenarios and benchmarks. The results are very encouraging and suggest a future parallel compiler implementation based on our model.
Conference Paper
A general and optimal algorithm for loop distribution when control flow is present is proposed. The algorithm can be used to enhance the effectiveness of vectorizers, parallelizers, and programming environments. The method performs loop distribution in the presence of control flow based on control dependencies. This algorithm is optimal in that it generates the minimum number of new arrays and tests possible. A code generation algorithm that produces code for the resulting program without replicating statements or conditions is also presented
An algorithm for elimination of forward dependences in parallel loops
  • Z Scczerbinski
Optimization of parallel loops by elimination of redundant data dependences
  • Z Szczerbinski
  • Z. Szczerbinski