Chapter

FPGA-Based 128-Bit RISC Processor Using Pipelining

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The main aim is to implement 128-bit RISC processor using pipelining techniques through FPGA with the help of von Neumann architecture. With the increase in the use of the FPGA in various embedded applications, there is a need to support processor designs on FPGA. The type of processor proposed is a soft processor with a simple instruction set which can be modified according to use because of the reconfigurable nature of FPGA. The type of architecture implemented is von Neumann. Prominent feature of the processor is pipelining which improves the performance considerably such that one instruction is executed per clock cycle. Due to the increase in innovations in the development of processors, the increasing popularity of open source projects like RISC-V ISA (Instruction Set Architecture), there is a need to also rapidly understand these designs and also upgrade them which can easily be performed on FPGA with trade off in speeds and size as compared to commercial ASIC processors, and hence, we are motivated to understand these systems. In this paper, a 128-bit RISC processor is implemented using FPGA pipelining.KeywordsRISC—reduced instruction set computerFPGA—field programmable gate arrayISA—instruction set architectureASIC—application specific integrated circuit

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Pipelining is a technique that exploits parallelism, among the instructions in a sequential instruction stream to get increased throughput, and it lessens the total time to complete the work. . The major objective of this architecture is to design a low power high performance structure which fulfils all the requirements of the design. The critical factors like power, frequency, area, propagation delay are analysed using Spartan 3E XC3E 1600e device with Xilinx tool. In this paper, the 32-bit MIPS RISC processor is used in 6-stage pipelining to optimize the critical performance factors. The fundamental functional blocks of the processor include Input/Output blocks, configurable logic blocks, Block RAM, and Digital clock Manager and each block permits to connect to multiple sources for the routing. The Auxiliary units enhance the performance of the processor. The comparative study elevates the designed model in terms of Area, Power and Frequency. MATLAB2D/3D graphs represents the relationship among various parameters of this pipelining. In this pipeline model, it consumes very less power (0.129 W),path delay (11.180 ns) and low LUT utilization (421). Similarly, the proposed model achieves better frequency increase (285.583 Mhz.), which obtained better results compared to other models.
Article
Full-text available
In modern computing, multitasking is the most favorable aspect. An un-pipelined instruction cycle (fetch-execute cycle) CPU processes instructions one after another increasing duration at lesser speed in completing tasks. With pipelined computer architecture, unprecedented improvement in size and speed are achievable. This work investigates the possibility of a better improvement to computer architecture through understanding the inner workings of instruction pipelining in operating system. A design of a 5 stage pipelined architecture simulator for RiSC-16 processors using Visual Basic programming has been achieved contrary to the common available four stage simulators. The simulator also future two most common pipeline instruction hazards generally missing in most available simulators. Thus, the designed simulator becomes an appropriate tool for understanding the concept of pipelining on a step-by-step visualization based instructioncycle processors hence facilitating a more efficient design in computer architecture. The simulator has been evaluated based on its closeness to real time pipelined computer architecture and through execution of all 8 basic RiSC-16 instruction set with data dependency and control hazard.
Conference Paper
This paper presents a novel microprocessor architecture, that was especially designed to reduce the power dissipation of modern systems-on-a-chip. The applications we aim at with this architecture are ultra-low-power embedded systems like intelligent medical implants or sensorized micro-transponders. We introduce new types of data storage files and hardware supported constant elimination to utilize the mostly local scope of common arithmetic operations for reducing energy. A multi-level instruction-cache scheme together with a cache controller supporting sophisticated opcode preprocessing operations like Huffman decoding decreases the amount of external memory accesses and size. Additionally the width of pointers is significantly reduced by a table-lookup cache-miss concept. Finally a segmented gray-code address counter minimizes the transitions on external buses. All these concepts combine into a completely new type of microprocessor architecture, which is designed to reduce transitions per operation as much as possible
Article
We present a method for reducing the power consumption of compressed-code systems by selectively inverting bits that are transmitted on the bus. By incorporating bus inversion into code compression/decompression, we reduce power consumption with no cost in hardware or power relative to code compression without inversion. Inverting has to be done carefully to ensure that the codes can still be decoded. As an additional challenge, compression will generally increase bit-toggling as it removes redundancies from the code transmitted. Therefore, we need to find the right balance between compression ratio and bit-toggling reduction. This paper presents a suitable algorithm that will combine approximate compression techniques with bit-toggling reduction and will explore the various tradeoffs. We take advantage of the approximations introduced to modify codes and reduce bit-toggling, while maintaining compression performance and decoding speed. An interesting result that is derived from our work is that high compression ratios do not necessarily result in the lowest power consumption. By using our method, bus-related power consumption has been reduced by as much as 35% compared to a system with no compression, and as much as 14% compared to a compressed-code system. Bit-toggling reduction does not impose any additional hardware costs other than the decompression engine. We also present a detailed analysis on how bus widths affect bit-toggling when transmitting compressed code, and we show experimental results on ARM, MIPS, and SPARC code. We finally compare our work with Bus Invert and show results that are superior except for the random data case where Bus Invert performs better.
Article
The clock distribution and generation circuitry forms a critical component of current synchronous digital systems and is known to consume at least a quarter of the power budget of existing microprocessors. We propose and validate a high level model for evaluating the energy dissipation of the clock generation and distribution circuitry, including both the dynamic and leakage power components. The validation results show that the model is reasonably accurate, with the average deviation being within 10% of SPICE simulations. Access to this model can enable further research at high-level design stages in optimizing the system clock power. To illustrate this, a few architectural modifications are considered and their effect on the clock subsystem and the total system power budget is assessed.
Article
This paper models the clock behavior in a sequential circuit by a quaternary variable and uses this representation to propose and analyze two clock-gating techniques. It then uses the covering relationship between the triggering transition of the clock and the active cycles of various flip flops to generate a derived clock for each flip flop in the circuit. A technique for clock gating is also presented, which generates a derived clock synchronous with the master clock. Design examples using gated clocks are provided next. Experimental results show that these designs have ideal logic functionality with lower power dissipation compared to traditional designs
Article
Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz. This paper describes some of the techniques the Alpha design team utilized to help manage power dissipation. In addition, the electrical design of the power, ground, and clock networks is presented. 2. INTRODUCTION Digital introduced the Alpha 21064 [1] microprocessor in 1992, thus delivering the industry's highest performance at that time. Manufacturing process technology advancements, architectural innovations, and full-custom circuit design techniques have been significant contributors to Digital's delivery of two additional generations of performance leadership Alpha microprocessors [2,3]. Between the first generation Alpha 21064 and the third generation Alpha 21264 designs, device counts have...
Design and development of FGPA based low power pipelined 64-bit RISC processor with double precession floating point unit
  • J Vijaykumar
  • B Nagaraju
  • C Swapna
  • T Ramanujappa