Conference Paper

PARLGRAN: Parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

Center for Embedded Comput. Syst., California Univ., Irvine, CA, USA
DOI: 10.1109/ASPDAC.2006.1594733 Conference: Design Automation, 2006. Asia and South Pacific Conference on
Source: IEEE Xplore


Partial dynamic reconfiguration, often called RTR (run-time reconfiguration) is a key feature in modern reconfigurable platforms. While partial RTR enables additional application performance, it imposes physical constraints necessitating simultaneous scheduling and placement while mapping application task graphs onto such architectures. In this paper, we present PARLGRAN, an approach that maximizes performance of application task chains by selecting a suitable granularity of data-parallelism for individual data parallel tasks. Our approach focuses on reconfiguration delay overhead and placement-related issues (such as fragmentation) while selecting individual data-parallelism granularity as an integral part of simultaneous scheduling and placement. We demonstrate that our heuristic generates high-quality schedules on an extensive set of over a 1000 synthetic experiments by comparing the results with an approach that tries to statically maximize data-parallelism, i.e., does not consider the overheads and constraints associated with partial RTR. A detailed case-study on JPEG encoding additionally confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism.

Download full-text


Available from: Eli Bozorgzadeh
  • Source
    • "The goal in [13] is to define a specific methodology for scheduling the tasks of these applications in order to reduce the overall completion time. The same authors present in [7] an enhanced solution for the same problem: PARLGRAN tries to reduce the total execution time using two different techniques. The former one is called simple fragmentation reduction and it places the new task in the first available area on the FPGA in the opposite side of the FPGA with respect to the location of the previous task. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Aim of this paper is to define a scheduling of the task graph of an application that minimizes its total execution time on a partially dynamically reconfigurable FPGA. The scheduler has to take into account the reconfiguration overhead of each task, the area constraint of the target FPGA, the precedences between the tasks, configuration prefetching and module reuse. We introduce an ILP formulation to solve the task scheduling problem in the reconfigurable architecture scenario. This formulation has been used to identify interesting features for a possible heuristic scheduler. The results of the ILP solution show how a reconfiguration-aware scheduler exploiting all the reconfiguration features can outperform one with partial knowledge.
    Full-text · Conference Paper · Mar 2008
  • Source
    • "We first briefly review key principles of exploiting data-parallelism with partial RTR, borrowed from [2]. Assuming that enough hardware logic is available, instantiating three copies of a data-parallel task T 1 leads to performance improvement (shorter schedule length) as shown in Fig 6 and Fig 7. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Partial dynamic reconfiguration (often referred to as partial RTR) enables true on-demand computing. A dynamically invoked application is assigned resources such as data bandwidth, configurable logic, and the limited logic resources are customized during application execution with partial RTR. In this work, we present key theoretical principles for maximizing application performance when available bandwidth is limited. We exploit bandwidth very effectively by selecting a suitable clock frequency for each task and maximize performance with partial RTR by exploiting data-parallelism property of common image-processing tasks. Our theoretical principles are integrated in our scheduling strategy, SCHEDRTR. We present detailed application case studies on a cycle-accurate simulation platform that addresses microarchitectural concerns and includes detailed resource considerations of the Virtex XC2V3000 device. Our results demonstrate that applying SCHEDRTR to common image-filtering applications leads to 15-20% performance gain in scenarios with limited bandwidth, when compared to a sophisticated RTR scheduling strategy with data-parallelism but simpler bandwidth considerations.
    Preview · Conference Paper · Jun 2007
  • Source
    • "The first algorithm presented in [2] tries to determine the best partitioning between a fixed and a reconfigurable part using ILP, while the second algorithm presents a partitioning between fixed, reconfigurable and software execution. Other research has focused on determining the optimal number of kernels that have to be configured considering the memory bandwidth [3]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present an FPGA area al- location algorithm for parallel OpenMP application that aim to improve performance for a specific reconfiguration area. The algorithm will be implemented in Delft Work- bench, a toolchain developed at TU Delft with focus on reconfigurable architectures. The hardware platform used to gather the experimental results is a Xilinx Virtex II Pro board with a PowerPC 405 processor and an FPGA. Using profiling information and the structure of the application we construct a mathematical model which is then used by a modified ILP (Integer Linear Programming) solver to choose the optimal mapping.
    Full-text · Article · Jan 2007
Show more