Conference Paper

Customization of application specific heterogeneous multi-pipeline processors

DOI: 10.1145/1131693 Conference: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2006, Munich, Germany, March 6-10, 2006
Source: DBLP


In this paper we propose Application Specific Instruction Set Processors with heterogeneous multiple pipelines to efficiently exploit the available parallelism at instruction level. We have developed a design system based on the Thumb processor architecture. Given an application specified in C language, the design system can generate a processor with a number of pipelines specifically suitable to the application, and the parallel code associated with the processor. Each pipeline in such a processor is customized, and implements its own special instruction set so that the instructions can be executed in parallel with low hardware overhead. Our simulations and experiments with a group of benchmarks, largely from Mibench suite, show that on average, 77% performance improvement can be achieved compared to a single pipeline ASIP, with the overheads of 49% on area, 51% on leakage power, 17% on switching activity, and 69% on code size.

Download full-text


Available from: Sri Parameswaran, Sep 11, 2014
  • Source
    • "We generated codes with and without loop unrolling. Both types of assembly codes were scheduled into a number of pipelines based on the available ILP using the algorithm specified in [3]. The scheduled code was assembled into binary. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allow customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs were proposed to improve the performance of such systems by compromising between speed and processor area. One of the problems in the multi-pipeline design is the limited inherent instruction level parallelism (ILP) available in applications. The ILP of application programs can be improved via a compiler optimization technique known as loop unrolling. In this paper, we present the impact of loop unrolling on the performance (speed) of multi-pipeline ASIPs. The improvement in speed averages around 15% for a number of benchmark applications with the maximum improvement of around 30%. In addition, we report the variation of performance against the loop unrolling factor - the amount of unrolling performed on an application.
    Full-text · Conference Paper · Jan 2010
  • Source
    • "The approach focuses on encoding instructions for opcode field, assuming operand field length for each instruction is given. These approaches are not tailored to our target processors proposed in [16]. The techniques presented in this paper exploits the unique architectural features of our target ASIP, where each pipeline has different control unit for different set of instructions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Small area and code size are two critical design issues in most of embedded system designs. In this paper, we tackle these issues by customizing forwarding networks and instruction encoding schemes for multi-pipe Application Specific Instruction-Set Processors (ASIPs). Forwarding is a popular technique to reduce data hazards in the pipeline to improve performance and is applied in almost all modern processor designs; but it is very area expensive. Instruction encoding schemes have a direct impact on code size; an efficient encoding method can lead to a small instruction width, and hence reducing the code size. We propose application specific techniques to reduce forwarding networks and instruction widths for ASIPs with multiple pipelines. By these design techniques, it is possible to reduce area, code size, and even power consumption (due to reduced area), without costing any performance. Our experiments, on a set of benchmarks using the proposed customization approaches show that, on average, there are 27% savings on area, 30% on leakage power, 16.7% on code size, and at the same time, performance even improves by 4% because of the reduced clock period.
    Full-text · Conference Paper · Nov 2006

  • No preview · Conference Paper · Jan 2008
Show more