Conference Paper

A Novel Data-Path for Accelerating DSP Kernels.

University of Patras, Patras, Greece; Aristotle University, Thessalonica, Greece; Southern Illinois University, Carbondale, USA; Democriteus University, Xanthi, Greece
DOI: 10.1007/978-3-540-27776-7_38 Conference: Computer Systems: Architectures, Modeling, and Simulation, Third and Fourth International Workshops, SAMOS 2003 and SAMOS 2004, Samos, Greece, July 21-23, 2003 and July 19-21, 2004, Proceedings
Source: DBLP

ABSTRACT A high-performance data-path to implement DSP kernels is proposed in this paper. The data-path is based on a flexible, universal, and regular component that allows to optimally exploiting both inter-and intra-component chaining of operations. The introduced component is a combinational circuit with steering logic that allows in easily realizing any desirable complex hardware unit, called template; so that the data-path's performance benefits by the intra-component chaining of operations. Due to the component's flexible and universal structure, the Data Flow Graph is realized by a small number of such components. The small numbers of the used components coupled with a configurable interconnection network allow adopting direct inter-component connections and optimally exploiting any inter-component chaining possibility over to the existing template-based methods. Also, due to universal and flexible structure of the component, scheduling and binding are accomplished by simple, yet efficient, algorithms achieving minimum latency at the expense of an area penalty and a small overhead at the control circuit and clock period. Results on DSP benchmarks show an average latency reduction of 20%, when the proposed data-path is compared with a high-performance data-path.

0 Bookmarks
 · 
58 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we describe a Configuration PRofiling tool (CPR) and show how it can be used to aid compiler designers, FPGA architects and in the construction of macro-generator libraries. CPR uses subgraph matching to identify the parts of an application which are most important to achieve high performance. Using CPR as a guide we implemented a few macros for a macro-generator library, which yielded significant improvement in both the quality of configurations and speed of compilation
    Field-Programmable Custom Computing Machines, 1999. FCCM '99. Proceedings. Seventh Annual IEEE Symposium on; 02/1999
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).
    Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, FPGA 2004, Monterey, California, USA, February 22-24, 2004; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a new approach to performance-driven template mapping for high-level synthesis. Template mapping, the process of mapping high-level algorithmic descriptions to specialized hardware libraries or instruction sets, involves template matching, template selection, and clock selection. Efficient algorithms for each are presented, and novel issues such as partial matching are addressed. The paper focuses on datapath-intensive ASIC design, though the concepts are also highly applicable to compiler development. Experimental results on examples from real applications show significant improvements in throughput with limited area overhead
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 09/1996; · 1.09 Impact Factor

Full-text (3 Sources)

View
22 Downloads
Available from
May 28, 2014