Conference Paper

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

Center for Embedded Comput. Syst., California Univ., Irvine, CA, USA
DOI: 10.1109/ASPDAC.2006.1594733 Conference: Design Automation, 2006. Asia and South Pacific Conference on
Source: IEEE Xplore

ABSTRACT Partial dynamic reconfiguration, often called RTR (run-time reconfiguration) is a key feature in modern reconfigurable platforms. While partial RTR enables additional application performance, it imposes physical constraints necessitating simultaneous scheduling and placement while mapping application task graphs onto such architectures. In this paper, we present PARLGRAN, an approach that maximizes performance of application task chains by selecting a suitable granularity of data-parallelism for individual data parallel tasks. Our approach focuses on reconfiguration delay overhead and placement-related issues (such as fragmentation) while selecting individual data-parallelism granularity as an integral part of simultaneous scheduling and placement. We demonstrate that our heuristic generates high-quality schedules on an extensive set of over a 1000 synthetic experiments by comparing the results with an approach that tries to statically maximize data-parallelism, i.e., does not consider the overheads and constraints associated with partial RTR. A detailed case-study on JPEG encoding additionally confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism.

Download full-text


Available from: Eli Bozorgzadeh, Jun 23, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Aim of this paper is to define a scheduling of the task graph of an application that minimizes its total execution time on a partially dynamically reconfigurable FPGA. The scheduler has to take into account the reconfiguration overhead of each task, the area constraint of the target FPGA, the precedences between the tasks, configuration prefetching and module reuse. We introduce an ILP formulation to solve the task scheduling problem in the reconfigurable architecture scenario. This formulation has been used to identify interesting features for a possible heuristic scheduler. The results of the ILP solution show how a reconfiguration-aware scheduler exploiting all the reconfiguration features can outperform one with partial knowledge.
    Design, Automation and Test in Europe, DATE 2008, Munich, Germany, March 10-14, 2008; 03/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present an FPGA area al- location algorithm for parallel OpenMP application that aim to improve performance for a specific reconfiguration area. The algorithm will be implemented in Delft Work- bench, a toolchain developed at TU Delft with focus on reconfigurable architectures. The hardware platform used to gather the experimental results is a Xilinx Virtex II Pro board with a PowerPC 405 processor and an FPGA. Using profiling information and the structure of the application we construct a mathematical model which is then used by a modified ILP (Integer Linear Programming) solver to choose the optimal mapping.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this dissertation, we address the problem of runtime adaptation of the application to its execution environment. A typical example is changing theprocessing element on which a computation is executed, considering the available processing elements in the system. This is done based on the information and instrumentation provided by the compiler and taking into account the status of the environment. The work focuses on heterogeneous multicore embedded architectures. We address three aspects of application optimizations: hardware software mapping, memory allocation and parallel execution. For each aspect, an algorithm is developed and, using a suitable application, it is tested on the hardware platform. The programming paradigm on which this work is based is the Molen programming paradigm, extended and adapted for our specific platform and operating environment. The hardware software mapping algorithm objective is to choose at runtime, on which processing element it is more efficient to execute a function. For the memory allocation we propose an algorithm, that using compile-time gathered information and the current execution environment, decides on the best allocation for memory, at runtime. For dealing with parallel applications we developed an algorithm that selects the best trade-off between area and speedup and decides on the number of concurrent units that execute. The experiments were performed on an embedded multicore heterogeneous platform, namely the hArtes Hardware Platform (hHP). This platform contains an ARM processor as General Purpose Processor (GPP), an Atmel Magic Diopsis Digital Signal Processor (DSP) and a Xilinx Virtex4 Field Programmable Gate Array (FPGA). The applications used to validate the algorithms are real life applications from the multimedia field: a video encoder/decoder and a wavefield synthesis application. The mapping algorithms obtains improvements between 5% and 43%. We showed this is an adaptable algorithm, that will adapt the execution in case the execution overhead increases. The memory allocation algorithm obtained a speedup of 18% on the selected application. For this algorithm we show that the solution is within 14% of the optimal solution, computed using Integer Linear Programming (ILP). The scenario based selection of parallel computations, is between 21% to 92% better than existing solutions.