Conference Paper

Optimizing MPI Runtime Parameter Settings by Using Machine Learning

DOI: 10.1007/978-3-642-03770-2_26 Conference: 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface


Manually tuning MPI runtime parameters is a practice commonly employed to optimise MPI application performance on a specific architecture.
However, the best setting for these parameters not only depends on the underlying system but also on the application itself and its input data.
This paper introduces a novel approach based on machine learning techniques to estimate the values of MPI runtime parameters that tries to achieve optimal speedup for a target architecture and any unseen input program.
The effectiveness of our optimization tool is evaluated against two benchmarks executed on a multi-core SMP machine.

Download full-text


Available from: Simone Pellegrini, Oct 03, 2015
44 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: MPI implementations provide several hundred runtime parameters that can be tuned for performance improvement. The ideal parameter setting does not only depend on the target multiprocessor architecture but also on the application, its problem and communicator size. This paper presents ATune, an automatic performance tuning tool that uses machine learning techniques to determine the program-specific optimal settings for a subset of the Open MPI's runtime parameters. ATune learns the behaviour of a target system by means of a training phase where several MPI benchmarks and MPI applications are run on a target architecture for varying problem and communicator sizes. For new input programs, only one run is required in order for ATune to deliver a prediction of the optimal runtime parameters values. Experiments based on the NAS Parallel Benchmarks performed on a cluster of SMP machines are shown that demonstrate the effectiveness of ATune. For these experiments, ATune derives MPI runtime parameter settings that are on average within 4% of the maximum performance achievable on the target system resulting in a performance gain of up to 18% with respect to the default parameter setting.
    Proceedings of the 7th ACM international conference on Computing frontiers; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicore processors have not only reintroduced Non-Uniform Memory Access (NUMA) architectures in nowadays parallel computers, but they are also responsible for non-uniform access times with respect to Input/Output devices (NUIOA). In clusters of multicore machines equipped with several Network Interfaces, performance of communication between processes thus depends on which cores these processes are scheduled on, and on their distance to the Network Interface Cards involved. We propose a technique allowing multirail communication between processes to carefully distribute data among the network interfaces so as to counterbalance NUIOA effects. We demonstrate the relevance of our approach by evaluating its implementation within OpenMPI on a Myri-10G + InfiniBand cluster.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a methodology designed to tackle the most common problems of MPI parallel programs. By developing a methodology that applies simple steps in a systematic way, we expect to obtain the basis for a successful autotuning approach of MPI applications based on measurements taken from their own execution. As part of the Au-toTune project, our work is ultimately aimed at extending Periscope to apply automatic tuning to parallel applications and thus provide a straightforward way of tuning MPI parallel codes. Experimental tests demonstrate that this methodology could lead to significant performance improvements.
    Proceedings of the 20th European MPI Users' Group Meeting; 09/2013
Show more