Conference Paper

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.

DOI: 10.1007/978-3-540-78791-4_9 Conference: Compiler Construction, 17th International Conference, CC 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29 - April 6, 2008. Proceedings
Source: DBLP

ABSTRACT The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a com- plex sequence of execution-reordering loop transformations that can improve per- formance by parallelization as well as locality enhancement. Although a signifi- cant body of research has addressed affine scheduling and partitioning, the prob- lemofautomaticallyfindinggoodaffinetransformsforcommunication-optimized coarse-grained parallelization together with locality optimization for the general case of arbitrarily-nested loop sequences remains a challenging problem. We propose an automatic transformation framework to optimize arbitrarily- nested loop sequences with affine dependences for parallelism and locality si- multaneously. The approach finds good tiling hyperplanes by embedding a pow- erful and versatile cost function into an Integer Linear Programming formulation. These tiling hyperplanes are used for communication-minimized coarse-grained parallelization as well as for locality optimization. The approach enables the min- imization of inter-tile communication volume in the processor space, and mini- mization of reuse distances for local execution at each node. Programs requir- ing one-dimensional versus multi-dimensional time schedules (with scheduling- based approaches) are all handled with the same algorithm. Synchronization-free parallelism, permutable loops or pipelined parallelism at various levels can be detected. Preliminary studies of the framework show promising results.

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: OpenMP provides an easy-to-learn and powerful programming environment for the development of parallel programs. We propose here an algorithm for the automatic correction of the OpenMP tasking model. Assuming a compiler or programmer has identified task regions in the source programs, the proposed algorithm will automatically generate correct task clauses and synchronization. The proposed algorithm is implemented here based on the ROSE compiler infrastructure; 14 benchmark programs are tested, each of which has had all clauses in the task directives removed for the evaluation. The results of this experimental evaluation show that the proposed technique can successfully generate correct clauses for the tested benchmark programs. The proposed technique can simplify the parallelizing of programs using the OpenMP tasking model, making parallel programming more effective and productive.
    The Journal of Supercomputing 12/2014; DOI:10.1007/s11227-014-1326-3 · 0.84 Impact Factor
  • Conference Paper: Loop transformations
    [Show abstract] [Hide abstract]
    ABSTRACT: High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may be off by orders of magnitude in performance compared to hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. Our framework has been implemented and validated experimentally on a representative set of benchmarks running on state-of-the-art multi-core platforms.
    the 38th annual ACM SIGPLAN-SIGACT symposium; 01/2011
  • Computational Mathematics and Mathematical Physics 08/2014; 54(8):1316-1326. DOI:10.1134/S0965542514080077 · 0.59 Impact Factor

Full-text (2 Sources)

Available from
May 23, 2014