Conference Paper

Customization of application specific heterogeneous multi-pipeline processors.

DOI: 10.1145/1131693 Conference: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2006, Munich, Germany, March 6-10, 2006
Source: DBLP

ABSTRACT In this paper we propose Application Specific Instruction Set Processors with heterogeneous multiple pipelines to efficiently exploit the available parallelism at instruction level. We have developed a design system based on the Thumb processor architecture. Given an application specified in C language, the design system can generate a processor with a number of pipelines specifically suitable to the application, and the parallel code associated with the processor. Each pipeline in such a processor is customized, and implements its own special instruction set so that the instructions can be executed in parallel with low hardware overhead. Our simulations and experiments with a group of benchmarks, largely from Mibench suite, show that on average, 77% performance improvement can be achieved compared to a single pipeline ASIP, with the overheads of 49% on area, 51% on leakage power, 17% on switching activity, and 69% on code size.

0 Bookmarks
 · 
75 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allow customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs were proposed to improve the performance of such systems by compromising between speed and processor area. One of the problems in the multi-pipeline design is the limited inherent instruction level parallelism (ILP) available in applications. The ILP of application programs can be improved via a compiler optimization technique known as loop unrolling. In this paper, we present the impact of loop unrolling on the performance (speed) of multi-pipeline ASIPs. The improvement in speed averages around 15% for a number of benchmark applications with the maximum improvement of around 30%. In addition, we report the variation of performance against the loop unrolling factor - the amount of unrolling performed on an application.
    Industrial and Information Systems (ICIIS), 2009 International Conference on; 01/2010
  • Proceedings of the Peradeniya University Research Sessions (PURSE) 2008; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: The GCA model (Global Cellular Automata) is a massively parallel computation model which is a generalization of the Cellular Automata model. A GCA cell contains data and link information. Using the link information each cell has dynamic read access to any global cell in the field. The data and link information is updated in every generation. The GCA model is applicable and efficient for a large range of parallel algorithms (sorting, vector reduction, graph algorithms, matrix computations etc.). In order to describe algorithms for the GCA model the experimental language GCAL was developed. GCAL programs can be transformed automatically into a data parallel architecture (DPA). The paper presents for the N-body problem how the force calculation between the masses can be described in GCAL and synthesized into a data parallel architecture. At first the GCAL description of the application is transformed into a Verilog description which is inserted into a Verilog template describing the general DPA. Then the whole Verilog code is used as input for an FPGA synthesizing tool which generates the application-specific DPA. Two different DPAs are generated, a “horizontal” and a “vertical” DPA. The horizontal DPA uses 17 floating-point operators in each deep pipeline. In contrast the “vertical” DPA uses only one floating-point operation at a time out of a set of 6 floating-point operators. Both architectures are compared to resource consumption, time per cell operation and cost (logic elements * execution time). It turned out that the horizontal DPA is approximately 15 times more cost efficient than the vertical DPA.
    Architecture of Computing Systems - ARCS 2009, 22nd International Conference, Delft, The Netherlands, March 10-13, 2009. Proceedings; 01/2009

Full-text (3 Sources)

Download
8 Downloads
Available from
Sep 11, 2014