Conference Paper

Active pebbles: Parallel programming for data-driven applications

DOI: 10.1145/1995896.1995934 Conference: Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011
Source: DBLP


The scope of scientific computing continues to grow and now includes diverse application areas such as network analysis, combinatorialcomputing, and knowledge discovery, to name just a few. Large problems in these application areas require HPC resources, but they exhibit computation and communication patterns that are irregular, fine-grained, and non-local, making it difficult to apply traditional HPC approaches to achieve scalable solutions. In this paper we present Active Pebbles, a programming and execution model developed explicitly to enable the development of scalable software for these emerging application areas. Our approach relies on five main techniques--scalable addressing, active routing, message coalescing, message reduction, and termination detection--to separate algorithm expression from communication optimization. Using this approach, algorithms can be expressed in their natural forms, with their natural levels of granularity, while optimizations necessary for scalability can be applied automatically to match the characteristics of particular machines. We implement several example kernels using both Active Pebbles and existing programming models, evaluating both programmability and performance. Our experimental results demonstrate that the Active Pebbles model can succinctly and directly express irregular application kernels, while still achieving performance comparable to MPI-based implementations that are significantly more complex.

Download full-text


Available from: Andrew Lumsdaine
  • Source
    • "These applications are referred to data-intensive applications that cover a wide range of disciplines, including astronomy, astrophysics, bioinformatics, data analytics, data mining, and MPI ensembles [1]. The big data phenomenon [2] has expedited the evolution of paradigm shifting from computecentric model to data driven one [3]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific applications are ushering in the era of big data that has expedited the evolution of paradigm shifting from compute-centric model to data driven one. The Many-task computing (MTC) paradigm comes from a data driven model and aims to address the challenges of scheduling data-intensive workloads through over-decomposition. MATRIX is a distributed scheduler for fine-grained data-intensive MTC applications. We have evaluated MATRIX on the BG/P machine up to 4K cores, and on the Kodiak cluster up to 200 cores. We propose to integrate the Swift workflow engine with MATRIX to enable MATRIX running many more scientific applications in the Cloud. We also plan to replace the centralized schedulers of the Hadoop clusters, such as YARN and Mesos, by MATRIX to support MATRIX running the data-intensive applications in the data centers and Cloud domains. We believe that the two recently funded Cloud testbeds, namely the Chameleon and CloudLab, would offer great platforms for our experiments.
    Full-text · Conference Paper · Jan 2014
  • Source
    • "Although this approach has been tremendously successful for scientific applications based on discretized PDEs, it is not well-suited for graph-based, data-intensive applications [1]. To address these issues, we have developed an approach for portably expressing high-performance graph algorithms based on fine-grained generalized active messages, as provided by the Active Pebbles programming model [5]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, graph computation has emerged as an important class of high-performance computing application whose characteristics differ markedly from those of traditional, compute-bound, kernels. Libraries such as BLAS, LAPACK, and others have been successful in codifying best practices in numerical computing. The data-driven nature of graph applications necessitates a more complex application stack incorporating runtime optimization. In this paper, we present a method of phrasing graph algorithms as collections of asynchronous, concurrently executing, concise code fragments which may be invoked both locally and in remote address spaces. A runtime layer performs a number of dynamic optimizations, including message coalescing, message combining, and software routing. Practical implementations and performance results are provided for a number of representative algorithms.
    Full-text · Conference Paper · Feb 2013
  • Source
    • "Avalanche is built on top of the Active Pebbles programming and execution model [50], and uses the AM++ implementation of that model [49] as its underlying infrastructure. The version of AM++ used in this work uses the Message Passing Interface (MPI) standard [34] as its underlying communication mechanism, although it could be re-targeted to a lower-level message passing mechanism such as InfiniBand Verbs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Flow graph models have recently become increasingly popular as a way to express parallel computations. However, most of these models either require specialized languages and compilers or are library-based solutions requiring coarse-grained applications to achieve acceptable performance. Yet, graph algorithms and other irregular applications are increasingly important to modern high-performance computing, and these applications are not amenable to coarsening without complicating algorithm structure. One effective existing approach for these applications relies on active messages;. However, the separation of control flow between the main program and active message handlers introduces programming difficulties. To ameliorate this problem, we present Avalanche, a flow graph model for fine-grained applications that automatically generates active-message handlers. Avalanche is built as a C++ library on top of our previously-developed Active Pebbles model; a set of combinators builds graphs at compile-time, allowing several optimizations to be applied by the library and a standard C++ compiler. In particular, consecutive flow graph nodes can be fused; experimental results show that flow graphs built from small components can still efficiently operate on fine-grained data.
    Full-text · Article · Jan 2012
Show more