Conference Paper

Optimizing MPI Runtime Parameter Settings by Using Machine Learning

DOI: 10.1007/978-3-642-03770-2_26 Conference: 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

ABSTRACT Manually tuning MPI runtime parameters is a practice commonly employed to optimise MPI application performance on a specific architecture.
However, the best setting for these parameters not only depends on the underlying system but also on the application itself and its input data.
This paper introduces a novel approach based on machine learning techniques to estimate the values of MPI runtime parameters that tries to achieve optimal speedup for a target architecture and any unseen input program.
The effectiveness of our optimization tool is evaluated against two benchmarks executed on a multi-core SMP machine.

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a methodology designed to tackle the most common problems of MPI parallel programs. By developing a methodology that applies simple steps in a systematic way, we expect to obtain the basis for a successful autotuning approach of MPI applications based on measurements taken from their own execution. As part of the Au-toTune project, our work is ultimately aimed at extending Periscope to apply automatic tuning to parallel applications and thus provide a straightforward way of tuning MPI parallel codes. Experimental tests demonstrate that this methodology could lead to significant performance improvements.
    Proceedings of the 20th European MPI Users' Group Meeting; 09/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The process of empirical autotuning results in the generation of many code variants which are tested, found to be suboptimal, and discarded. By retaining annotated performance profiles of each variant tested over the course of many autotuning runs of the same code across different hardware environments and different input datasets, we can apply machine learning algorithms to generate classifiers for runtime selection of code variants from a library, generate specialized variants, and potentially speed the process of autotuning by starting the search from a point predicted to be close to optimal. In this paper, we show how the TAU Performance System suite of tools can be applied to autotuning to enable reuse of performance data generated through autotuning.
    International Journal of High Performance Computing Applications 11/2013; 27(4):403-411. · 1.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicore processors have not only reintroduced Non-Uniform Memory Access (NUMA) architectures in nowadays parallel computers, but they are also responsible for non-uniform access times with respect to Input/Output devices (NUIOA). In clusters of multicore machines equipped with several Network Interfaces, performance of communication between processes thus depends on which cores these processes are scheduled on, and on their distance to the Network Interface Cards involved. We propose a technique allowing multirail communication between processes to carefully distribute data among the network interfaces so as to counterbalance NUIOA effects. We demonstrate the relevance of our approach by evaluating its implementation within OpenMPI on a Myri-10G + InfiniBand cluster.
    The 17th European MPI Users Group conference. 01/2010;

Full-text (2 Sources)

Available from
May 15, 2014