Conference Paper

Optimizing MPI Runtime Parameter Settings by Using Machine Learning

DOI: 10.1007/978-3-642-03770-2_26 Conference: 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface


Manually tuning MPI runtime parameters is a practice commonly employed to optimise MPI application performance on a specific architecture.
However, the best setting for these parameters not only depends on the underlying system but also on the application itself and its input data.
This paper introduces a novel approach based on machine learning techniques to estimate the values of MPI runtime parameters that tries to achieve optimal speedup for a target architecture and any unseen input program.
The effectiveness of our optimization tool is evaluated against two benchmarks executed on a multi-core SMP machine.

Download full-text


Available from: Simone Pellegrini
  • [Show abstract] [Hide abstract]
    ABSTRACT: MPI implementations provide several hundred runtime parameters that can be tuned for performance improvement. The ideal parameter setting does not only depend on the target multiprocessor architecture but also on the application, its problem and communicator size. This paper presents ATune, an automatic performance tuning tool that uses machine learning techniques to determine the program-specific optimal settings for a subset of the Open MPI's runtime parameters. ATune learns the behaviour of a target system by means of a training phase where several MPI benchmarks and MPI applications are run on a target architecture for varying problem and communicator sizes. For new input programs, only one run is required in order for ATune to deliver a prediction of the optimal runtime parameters values. Experiments based on the NAS Parallel Benchmarks performed on a cluster of SMP machines are shown that demonstrate the effectiveness of ATune. For these experiments, ATune derives MPI runtime parameter settings that are on average within 4% of the maximum performance achievable on the target system resulting in a performance gain of up to 18% with respect to the default parameter setting.
    No preview · Conference Paper · Jan 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multicore processors have not only reintroduced Non-Uniform Memory Access (NUMA) architectures in nowadays parallel computers, but they are also responsible for non-uniform access times with respect to Input/Output devices (NUIOA). In clusters of multicore machines equipped with several Network Interfaces, performance of communication between processes thus depends on which cores these processes are scheduled on, and on their distance to the Network Interface Cards involved. We propose a technique allowing multirail communication between processes to carefully distribute data among the network interfaces so as to counterbalance NUIOA effects. We demonstrate the relevance of our approach by evaluating its implementation within OpenMPI on a Myri-10G + InfiniBand cluster.
    Preview · Article · Sep 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The performance of MPI applications on parallel computers can be considerably improved by tuning the runtime parameters provided by modern MPI libraries. However, due to the large and increasing number of tunable parameters, finding a parameter setting which optimizes the execution of several user programs on a chosen target machine is challenging. Existing tools execute input programs multiple times with varying parameter settings until a satisfying performance is reached. Several hundred runs of the input programs are nevertheless needed making this approach appealing only when the cost of the tuning phase can be amortized over many runs of the optimized programs. In this paper, we introduce a novel technique for tuning MPI runtime parameter values to better suit the underlying system architecture. The MPI parameter values are determined by performing the analysis of variance (ANOVA) on experimental data collected by randomly exploring the optimization space of a set of computational kernels commonly employed in High Performance Computing (HPC). We use our new technique to derive optimized values for 27 runtime parameters of the Open MPI library for two different parallel architectures. Results show an average performance improvement up to 20% for codes from the SPEC MPI 2007 benchmark suite with respect to Open MPI's default parameter setting.
    Full-text · Conference Paper · Sep 2012
Show more