Conference Paper

PARLGRAN: parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

Center for Embedded Comput. Syst., California Univ., Irvine, CA, USA;
DOI: 10.1109/ASPDAC.2006.1594733 In proceeding of: Design Automation, 2006. Asia and South Pacific Conference on
Source: IEEE Xplore

ABSTRACT Partial dynamic reconfiguration, often called RTR (run-time reconfiguration) is a key feature in modern reconfigurable platforms. While partial RTR enables additional application performance, it imposes physical constraints necessitating simultaneous scheduling and placement while mapping application task graphs onto such architectures. In this paper, we present PARLGRAN, an approach that maximizes performance of application task chains by selecting a suitable granularity of data-parallelism for individual data parallel tasks. Our approach focuses on reconfiguration delay overhead and placement-related issues (such as fragmentation) while selecting individual data-parallelism granularity as an integral part of simultaneous scheduling and placement. We demonstrate that our heuristic generates high-quality schedules on an extensive set of over a 1000 synthetic experiments by comparing the results with an approach that tries to statically maximize data-parallelism, i.e., does not consider the overheads and constraints associated with partial RTR. A detailed case-study on JPEG encoding additionally confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Partial dynamic reconfiguration (often referred to as partial RTR) enables true on-demand computing. In an on-demand computing environment, a dynamically invoked application is assigned resources such as data bandwidth, configurable logic. The limited logic resources are customized during application execution by exploiting partial RTR. In this article, we propose an approach that maximizes application performance when available bandwidth and logic resources are limited. Our proposed approach is based on theoretical principles of minimizing application schedule length under bandwidth and logic resource constraints. It includes detailed microarchitectural considerations on a commercially popular reconfigurable device, and it exploits partial RTR very effectively by utilizing data-parallelism property of common image-processing applications. We present extensive application case studies on a cycle-accurate simulation platform that includes detailed resource considerations of the Xilinx Virtex XC2V3000. Our experimental results demonstrate that applying our proposed approach to common image-filtering applications leads to 15--20% performance gain in scenarios with limited bandwidth, when compared to prior work that also exploits data-parallelism with RTR but includes simpler bandwidth considerations. Last but not the least, we also demonstrate how our proposed theoretical principles can be directly applied to solve related problems such as minimizing schedule length under logic resource and power constraints.
    TRETS. 01/2010; 3:18.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new model for the partitioning and scheduling of a specification on partially dynamically reconfigurable hardware. Although this problem can be solved optimally only by tackling its subproblems jointly, the exceeding complexity of such a task leads to a decomposition into two phases. The partitioning phase is based on a new graph-theoretic approach, which aims to obtain near optimality even if performed independently from the subsequent phase. For the scheduling phase, a new integer linear programming formulation and a heuristic approach are developed. Both take into account configuration prefetching and module reuse. The experimental results show that the proposed method compares favorably with existing solutions.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 06/2009; · 1.09 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Heterogeneous reconfigurable systems provide drastically higher performance and lower power consumption than traditional CPU-centric systems. Moreover, they do it at much lower costs and shorter times to market than non-reconfigurable hardware solutions. They also provide the flexibility that is often required for the engineering of modern robust and adaptive systems. Due to their heterogeneity, flexibility and potential for highly optimized application-specific instantiation, reconfigurable systems are adequate for a very broad class of applications across different industry sectors. What prevents the reconfigurable system paradigm from a broad proliferation is the lack of adequate development methodologies and electronics design tools for this kind of systems. The ideal would be a seamless compilation of a high-level computation process specification into an optimized mixture of machine code executed on traditional CPU-centric processors and on the application-specific decentralized parallel data-flow-dominated reconfigurable processors and hardware accelerators. Although much research and development in this direction was recently performed, the adequate methodologies and tools necessary to implement this compilation process as an effective and efficient hardware/software co-synthesis flow are unfortunately not yet in place. This paper focuses on the recent developments and development trends in the design methods and synthesis tools for reconfigurable systems. Reconfigurable system synthesis performs two basic tasks: system structure construction and application process mapping on the structure. It is thus more complex than standard (multi-)processor-based system synthesis for software-programmable systems that only involves application mapping. The system structure construction may involve the macro-architecture synthesis, the micro-architecture synthesis, and the actual hardware synthesis. Also, the application process mapping can be more complicated and dynamic in reconfigurable systems. This paper reviews the recent methods and tools for the macro- and micro-architecture synthesis, and for the application mapping of reconfigurable systems. It puts much attention to the relevant and currently hot topic of (re-)configurable application-specific instruction set processors (ASIP) synthesis, and specifically, ASIP instruction set extension. It also discusses the methods and tools for reconfigurable systems involving CPU-centric processors collaborating with reconfigurable hardware sub-systems, for which the main problem is to decide which computation processes should be implemented in software and which in hardware, but the hardware/software partitioning has to account for the hardware sharing by different computation processes and for the reconfiguration processes. The reconfigurable system area is a very promising, but quite a new field, with many open research and development topics. The paper reviews some of the future trends in the reconfigurable system development methods and tools. Finally, the discussion of the paper is summarized and concluded.
    Integration. 01/2010; 43:1-33.

Full-text (3 Sources)

Available from
Jun 1, 2014