Conference Paper

PARLGRAN: parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

Center for Embedded Comput. Syst., California Univ., Irvine, CA, USA;
DOI: 10.1109/ASPDAC.2006.1594733 Conference: Design Automation, 2006. Asia and South Pacific Conference on
Source: IEEE Xplore

ABSTRACT Partial dynamic reconfiguration, often called RTR (run-time reconfiguration) is a key feature in modern reconfigurable platforms. While partial RTR enables additional application performance, it imposes physical constraints necessitating simultaneous scheduling and placement while mapping application task graphs onto such architectures. In this paper, we present PARLGRAN, an approach that maximizes performance of application task chains by selecting a suitable granularity of data-parallelism for individual data parallel tasks. Our approach focuses on reconfiguration delay overhead and placement-related issues (such as fragmentation) while selecting individual data-parallelism granularity as an integral part of simultaneous scheduling and placement. We demonstrate that our heuristic generates high-quality schedules on an extensive set of over a 1000 synthetic experiments by comparing the results with an approach that tries to statically maximize data-parallelism, i.e., does not consider the overheads and constraints associated with partial RTR. A detailed case-study on JPEG encoding additionally confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Heterogeneous reconfigurable systems provide drastically higher performance and lower power consumption than traditional CPU-centric systems. Moreover, they do it at much lower costs and shorter times to market than non-reconfigurable hardware solutions. They also provide the flexibility that is often required for the engineering of modern robust and adaptive systems. Due to their heterogeneity, flexibility and potential for highly optimized application-specific instantiation, reconfigurable systems are adequate for a very broad class of applications across different industry sectors. What prevents the reconfigurable system paradigm from a broad proliferation is the lack of adequate development methodologies and electronics design tools for this kind of systems. The ideal would be a seamless compilation of a high-level computation process specification into an optimized mixture of machine code executed on traditional CPU-centric processors and on the application-specific decentralized parallel data-flow-dominated reconfigurable processors and hardware accelerators. Although much research and development in this direction was recently performed, the adequate methodologies and tools necessary to implement this compilation process as an effective and efficient hardware/software co-synthesis flow are unfortunately not yet in place. This paper focuses on the recent developments and development trends in the design methods and synthesis tools for reconfigurable systems. Reconfigurable system synthesis performs two basic tasks: system structure construction and application process mapping on the structure. It is thus more complex than standard (multi-)processor-based system synthesis for software-programmable systems that only involves application mapping. The system structure construction may involve the macro-architecture synthesis, the micro-architecture synthesis, and the actual hardware synthesis. Also, the application process mapping can be more complicated and dynamic in reconfigurable systems. This paper reviews the recent methods and tools for the macro- and micro-architecture synthesis, and for the application mapping of reconfigurable systems. It puts much attention to the relevant and currently hot topic of (re-)configurable application-specific instruction set processors (ASIP) synthesis, and specifically, ASIP instruction set extension. It also discusses the methods and tools for reconfigurable systems involving CPU-centric processors collaborating with reconfigurable hardware sub-systems, for which the main problem is to decide which computation processes should be implemented in software and which in hardware, but the hardware/software partitioning has to account for the hardware sharing by different computation processes and for the reconfiguration processes. The reconfigurable system area is a very promising, but quite a new field, with many open research and development topics. The paper reviews some of the future trends in the reconfigurable system development methods and tools. Finally, the discussion of the paper is summarized and concluded.
    Integration. 01/2010; 43:1-33.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Power consumption is a key concern on modern reconfigurable architectures. In this paper, we address the problem of minimizing peak power while mapping application task chains onto reconfigurable architectures with partial dynamic reconfiguration capability. Our proposed methodology minimizes peak power for a given timing constraint. It is based on detailed data-parallelism considerations to ensure that tight timing constraints are met. Our methodology generates physically placed task execution schedules and includes selection of a suitable number of data-parallel instances for each task, a suitable clock frequency, and execution workload for each task instance. Case studies on real image-filtering applications demonstrate that our approach results in significant peak power savings (between 40%-50%) for tight as well as relaxed timing constraints
    Field Programmable Technology, 2006. FPT 2006. IEEE International Conference on; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The focus of this dissertation is on kernel loops (K-loops), which are loop nests that contain hardware mapped kernels in the loop body. In this thesis, we propose methods for improving the performance of such K-loops, by using standard loop transformations for exposing and exploiting the coarse grain loop level parallelism. We target a reconfigurable architecture that is a heterogeneous system consisting of a general purpose processor and a field programmable gate array (FPGA). Research projects targeting reconfigurable architectures are trying to give answers to several problems: how to partition the application -- decide which parts to be accelerated on the FPGA, how to optimize these parts (the kernels), what is the performance gain. However, only few try to exploit the coarse grain loop level parallelism. This work goes towards automatically deciding the number of kernel instances to place into the reconfigurable hardware, in a flexible way that can balance between area and performance. In this dissertation, we propose a general framework that helps determine the optimal degree of parallelism for each hardware mapped kernel within a K-loop, taking into account area, memory size and bandwidth, and performance considerations. In the future it can also take into account power. Furthermore, we present algorithms and mathematical models for several loop transformations in the context of K-loops. The algorithms are used to determine the best degree of parallelism for a given K-loop, while the mathematical models are used to determine the corresponding performance improvement. The algorithms are validated with experimental results. The loop transformations that we analyze in this thesis are loop unrolling, loop shifting, K-pipelining, loop distribution, and loop skewing. An algorithm that decides which transformations to use for a given K-loop is also provided. Finally, we also present an analysis of possible situations and justifications of when and why the loop transformations have or have not a significant impact on the K-loop performance.

Full-text (3 Sources)

Available from
Jun 1, 2014