ArticlePDF Available

Design and Implementation of a Five Stage Pipelining Architecture Simulator for RiSC-16 Instruction Set

Authors:

Abstract and Figures

In modern computing, multitasking is the most favorable aspect. An un-pipelined instruction cycle (fetch-execute cycle) CPU processes instructions one after another increasing duration at lesser speed in completing tasks. With pipelined computer architecture, unprecedented improvement in size and speed are achievable. This work investigates the possibility of a better improvement to computer architecture through understanding the inner workings of instruction pipelining in operating system. A design of a 5 stage pipelined architecture simulator for RiSC-16 processors using Visual Basic programming has been achieved contrary to the common available four stage simulators. The simulator also future two most common pipeline instruction hazards generally missing in most available simulators. Thus, the designed simulator becomes an appropriate tool for understanding the concept of pipelining on a step-by-step visualization based instructioncycle processors hence facilitating a more efficient design in computer architecture. The simulator has been evaluated based on its closeness to real time pipelined computer architecture and through execution of all 8 basic RiSC-16 instruction set with data dependency and control hazard.
Content may be subject to copyright.
Indian Journal of Science and Technology, Vol 10(3), DOI: 10.17485/ijst/2017/v10i3/110622 January 2017
ISSN (Print) : 0974-6846
ISSN (Online) : 0974-5645
* Author for correspondence
1. Introduction
Computer technology has evolved with various
architectures since the birth of the rst generation of
computers around the 1940s until now and people are
always looking for ways to improve the performance
of computer1. An instruction pipeline is a technique
oen used in the design of modern microprocessors,
microcontrollers and CPUs to increase their instruction
throughput per unit time1,2.
Pipelining is a standard feature in Reduced Instruction
Set Computing (RISC) processors analogous to a
manufacturing plant assembly line. is is because the
processor works on dierent steps of the instruction at
the same time and more instructions can then be executed
in a shorter period of time3. Pipelining is implemented
through RISC processor rather than in Complex
Instruction Set Computing (CISC) processor. Pipelining
has proved to be more ecient as traditional instruction
cycle leads to waste of CPU resources as instructions may
include other services such as read/write from/to memory,
storage or input/output devices, and CPU becomes idle at
this time. is will prolong the latency of an instruction
as well as the throughput of a program. As computer
systems evolve, greater performances are achieved by
taking advantage of improvements in technology, such
as faster circuitry and organizational enhancements such
as adding instruction pipelining to the processor1,4–6. By
implementing pipelining, the processing of instructions
is overlapped as illustrated in Figure 1, meaning while
Abstract
In modern computing, multitasking is the most favorable aspect. An un-pipelined instruction cycle (fetch-execute cycle)
CPU processes instructions one after another increasing duration at lesser speed in completing tasks. With pipelined
computer architecture, unprecedented improvement in size and speed are achievable. This work investigates the possibility
of a better improvement to computer architecture through understanding the inner workings of instruction pipelining
in operating system. A design of a 5 stage pipelined architecture simulator for RiSC-16 processors using Visual Basic
programming has been achieved contrary to the common available four stage simulators. The simulator also future two
most common pipeline instruction hazards generally missing in most available simulators. Thus, the designed simulator
becomes an appropriate tool for understanding the concept of pipelining on a step-by-step visualization based instruction-
          
based on its closeness to real time pipelined computer architecture and through execution of all 8 basic RiSC-16 instruction
set with data dependency and control hazard.
Keywords: Computer Processor & Architecture, Instruction Pipelining, RiSC-16 Simulator
Design and Implementation of a Five Stage
Pipelining Architecture Simulator for
RiSC-16 Instruction Set
Rashidah F. Olanrewaju1*, Fawwaz E. Fajingbesi1, S. B. Junaid2,
Ridzwan Alahudin1, Farhat Anwar1 and Bisma Rasool Pampori3
1Department of Electrical and Computer Engineering, IIUM, Malaysia; frashidah@iium.edu.my,
fawwazfajingbesi@yahoo.com, wanxneo89@gmail.com, farhat@iium.edu.my
2Department of Electrical and Computer Engineering, Ahmadu Bello University, Zaria;
abuyusra@gmail.com
3Department of Information Technology, Central University of Kashmir,
Srinagar–190015, Jammu and Kashmir, India
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology
2
Design and Implementation of a Five Stage Pipelining Architecture Simulator for RiSC-16 Instruction Set
the rst instruction is in the decode stage; the second
instruction is fetched. When the rst instruction is in the
execute stage, the second instruction moves to the decode
stage and another instruction, instruction 3 is fetched.
e rst instruction is complete only aer three cycles.
Subsequently, an instruction is completed on every cycle.
Figure 1. Fetch-execute cycle with pipelining.
is diers from a non-pipelined system shown in
Figure 2 where three cycles are required per instruction.
Figure 2. Un-pipelined fetch-execute cycle.
Note that however beautiful pipelining may sound,
there are situations where the next set of instruction
cannot execute in the next clock cycle. e situation
isusually termed hazards. ere are three basic types
of hazards which include structural, data and control
hazards7. Understanding the hazard and how they aect
processor operations would increase overall eciency of
such systems hence the need for a simulator design.
As seen in 19978 a pipeline simulator for the DLX
processor known as WinDLX soware was created. It
was a MS-Windows (16 bit) based pipeline simulator
written in C++. e simulator model and design was on
Hennessy-Patterson’s DLX at the architectural level. It was
intended for educational purpose to help visualize the
concept of instruction pipelining.
Other implementation of pipeline simulation was
include ModelSim simulator and Xilinx ISE tool soware2.
ey divided pipelining into four sub stages such as fetch,
decode, execute, store using Verilog HDL to design each
stage of pipelining. e technology schematics for each
stage were presented and by using this technique, a user
could gain better understanding of the internal workings
of a processor. However, the user is required to possess
special skills in order to simulate pipelining in ModelSim
and Xilinx tool.
Similar work on pipeline simulator was using
Java programming with a specic focus on student
interactivity9. ey chose RiSC-16 processor because it is
simple, complete and has been designed for educational
purposes. eir system oered user the ability to dene
its own programs in assembly language and the ability
to see graphically the corresponding internal dynamic
behavior of the processor.
Subsequently, other design have future like web-
simulation model capable of exploring the state-space
dened by a Unied Parallel Model and simulator7,10. ey
stated that the simulator could be used as a calculator,
deterministically calculating speedups given input
parameters. e Unied Parallel Model and simulator
can be used to explore the continuity of parallel speedup
possibilities allowing users to explore dierent computer
architectures with hardware support at any or all of ve
levels of parallelism, from intra-instruction (pipeline)
through a distributed n-tier client/server system. e
tool developed supports the simulation of various user-
congurable architectures and interconnection networks,
running a user-congurable and variable workload.
e rest of this work is organized as follows: Section
II presents the methodology, Section III results and
discussion while Section IV and V are conclusion and
references respectively.
2. Methodology
e focus of this work is on a 5-stage pipeline simulator
design and implementation for a RiSC-16 instruction
set processor. Figures 3 and 4 are examples of ve-stage
pipeline architecture.
Figure 3. Five stage pipeline model11.
Rashidah F. Olanrewaju, Fawwaz E. Fajingbesi, S. B. Junaid, Ridzwan Alahudin, Farhat Anwar and Bisma Rasool Pampori
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology 3
Figure 4. Five stages pipelined11.
e designed algorithm is simple yet novel as it
oers not only user-friendliness but also detail features
required for understanding and visualizing the concept
of pipelining instruction cycle. Features oered include
Data and control Hazards with their solutions, multiple
methods for instruction supply and translations making it
unique amongst its pairs. e system can be broken down
into ve stages performing the following functions, which
include Instruction fetch (IM), Decode with Register read
(REG), Execute (ALU), Data Memory (DM) and Register
write (REG).
2.1 Design and Algorithm of Components
Algorithm
For CLOCK
For the timer in VB.net which alternates between true
and false: positive and negative clock pulses. Boolean
variable:
IF clocksig is false then clocksig = true
ELSE IF clocksig is true then clocksig = false
Controlling the ve stages and the 5 latches
IFclocksig is true; call all the functions of the ve
stages and the ve latches
For L ATCH
Five latches represented by set of rich textbox. All
of these latches will update their values when there is
positive clock pulse. Each latch may contain dierent
numbers of values:
PC: contains program counter value (the address of
an instruction that will be fetched). e value will be
updated by an increment of a value provided by stage 2.
L1 contains an instruction that has been fetched by
IM stage
L2 contains the operand that is to be executed at ALU
and the memory access signal
L3 contains the result of the ALU operation together
with register address and the memory access
L4 contains a value that will be stored into a register
together with the register address
An adder in which the increment is set to one by
default controls the value of PC.
While, for L1 to L4, functions latch*() will copy the
output from their respective stages at every positive clock
pulse where * represents 1-4.
For STAGES
FOR Instruction Fetch
Set default output: “0000000000000000”
Only fetch if the PC is less than number of program
memory or the program will stop.
Skip stage if the program memory slot at address given
by PC contains nothing
Increment PC if an instruction is fetched
FOR Decode &regRead
Set default outputs to zeros
Skip if previous stage has not fetched any instruction
Set default PC increment to 1
Set default multiplexers control signal values (for
bypassing data dependencies)
Read the rst 3bits from an instruction string as
opcode
Get the address of register A (3bits) from an instruction
(except for BEQ operation)
Identify the operation of the instruction based on the
opcode:
“000” or “010” (add or nand)
“001” or “100” or “101” (addi or sw or lw)
“011” (lui)
“110” (beq)
“111” (jalr)
Check for dependency (*only if user ticked bypassing
/forwarding in hazard group box)
Write the meaning of the instruction in “Operation:”
rich textbox
IF “000” or “010” (add or nand)
Get the address of register B (3bits) and address of
register C (3bits) from instruction code
Get the values of register B and register C
Set output of stage 2:
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology
4
Design and Implementation of a Five Stage Pipelining Architecture Simulator for RiSC-16 Instruction Set
Output0: “00” (addition) /”01” (Nand)
Output1:regB value in decimal
Output2: regC value in decimal
Output3: regA address
Output4: 0 (no Data Memory Access)
Dependency: Check if there are registers B and C in
latch 2 or 3, if yes then set the multiplexer input to the next
latch that will be the output of the respective registers.
Exception for register with address “0”
IF “001” or “100” or “101” (addi or sw or lw)
Get the address of register B from instruction code
Get the value of signed immediate from the instruction
code
Get the value of register B
Set the output of stage 2:
Output0: 00(add)
Output1: regB value in decimal
Output2: signed Imm value in decimal
Output3: regA address/ regA value (for sw operation)
Output4:
= 0 for “001” (addi) operation -no Data Memory stage)
= 1 for “100” (sw) operation-from memory to register)
= -1 for “101” (lw) operation-from register to memory)
*Dependency: Check for register B content in latch 2
or 3, if yes then set the multiplexer input to the next latch
that will be the output of the respective register. Exception
for register r0
IF “011” (lui)
Get the value of unsigned immediate (10 bits) from
instruction code
Set the output of stage 2:
Output0: “10” (and)
Output1: unsigned immediate value in decimal
Output2: &HFFC0
Output3: regA address
Output4: 0 (no Data Memory stage)
IF “110” (beq)
Get the address of register B
Get the value of register B and register A
Get the value of signed Imm
*Dependency: Check if there are registers A and B in
latch 2, 3, or 4,
IF True, then replace register(s) value based on
statement below (except for register r0):
IF register is in latch 2 then the latest value will be in
stage 3 output
IF register is in latch 3 then the latest value will be in
latch 3 output
IF register is in latch 4 then the latest value will be in
latch 4 output
Perform Branch-if-equal operation:
If Ra = Rb en ‘beq operation (branch if equal), change
the PC increment to signed Imm value
Set stage 2 outputs to default.
IF “111” (jalr)
Get the address of register B
Get the value of register B, register A and latch (PC-1)
*Dependency: Check for register B content in latch
2 or 3, if yes then set the multiplexer input to the next
latch which will be the output of the respective register.
Exception for register
Store the address of the jalr operation (PC-1) to register
A and PC and move PC to register B value
Set the output of stage 2 to default.
FOR ALU-perform logic and mathematical operation
Get the operands from the multiplexers
Get the address of register A and operation code from
latch 2
Perform operation for the given operation code:
IF 00: operand1 + operand2
IF 01: Not (operand1 and operand2)
IF 10: operand1 and operand2
Set the outputs of stage 3:
Output0: result of the operation
Output1: register A
Output2: Output4 of stage 2.
Data Memory-perform memory access
Get third output of latch 3
Perform operation based on the output as follow:
IF “0” (dummy), Pass the other two outputs to the
next latch
Output0 of stage 4 = Output0 of latch 3
Output1 of stage 4 = Output1 of latch 3
IF “1” (get data from memory)
Output0 of stage 4 = dataMemory [ouput0 of latch 3]
Output1 of stage 4 = Output1 of latch 3
IF “-1” (store data to memory)
dataMemory [ouput0 of latch 3] = Output1 of latch 3
Set the output of stage 4 to zeros
FOR RegWrite- write the result into specied register
Get the outputs of latch 4
Store outputs of latch 4 into register r [output1 of latch 4]
For PROGRAM MEMORY
Represent instructions code of 16-bits in binary
format
Rashidah F. Olanrewaju, Fawwaz E. Fajingbesi, S. B. Junaid, Ridzwan Alahudin, Farhat Anwar and Bisma Rasool Pampori
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology 5
Array of textbox which form 40 textboxes in total
Addresses are in decimal format
16-bit addressing
For DATA MEMORY
Present 16-bits data in binary format
Array of textbox that form 10 textboxes in total
Addresses are in decimal format
16-bit addressing
For REGISTER
Use to store data of 16-bits in binary format
Array of textbox that form 8 textboxes in total
Addresses are in decimal format
16-bit addressing
e register 0 is read-only and contains the null value
For MISCELLANEOUS
“Hazard” group box: allow user to choose whether to
use dependency checker or not
“Buttons”
“RUN”: activate/deactivate Clock that controls the
pipeline system.
“Reset Program”: reset latches and stages input/output
to default
“Write New”: open a new form for user to write set of
instructions.
“Load”: open a txt le to be loaded into program
memory.
“Clear”: reset the simulation to default.
“Operation” richtextbox: provide translation of each
instruction to the user view
toDecimal() function: convert binary number to
decimal format
toBinary() function: convert decimal number to
binary
mux1() and mux2() functions: act as multiplexer
between latches to provide input to stage 3(ALU) which
for dependency hazard solution.
3. Result and Discussion
e designed simulator user interface and data input
screenshot are Figures 5and 6 respectively. e simulator
evaluation was majorly to test the response to data hazard,
control hazard and combination of all 8 basic RiSC-16
Instructions set and its results are shown in Figures 7–12.
e operations and code are been given below.
For Data Hazard evaluation:
Perform addition of operands:
addi 1, 0, 4 addi 2, 0, 5 add 1, 1, 2 add 3, 2, 1
Translate as:
R1 should contain 4+5 = 9: [1001B]
R3 should contain 9+5 = 13: [1110B]
Data dependency between:
addi 2, 0, 5 and add 1, 1, 2
add 1, 1, 2 and add 3, 2, 1
First solution: stall/nop operation –increase IPC
Second solution: bypassing/forwarding
• Implement multiplexers which are control by Decode
stage
• Checking dependency by comparing operands
registers with previous store-to register at each latch
• Set the multiplexer to correspond latest value of the
operand(s)
For Control Hazard evaluation
Branch operation:
addi 1, 0, 2 addi 2, 0, -1 addi 3, 0, 3 beq 1, 0, 4
add 0, 0, 0 add 1, 1, 2 jalr 7, 3 add 0, 0, 0
add 7, 0, 2
Translate as:
r1 = 2, r2 = -1, r3 = 3
While r1 not equal r0,
do r1= r1+r2
End while
r0+r2→ r7
Decrement r1 by one until it equals to zero and nally
adds r2 to r7
• Instruction aer branch operation will be fetched and
executed even if the condition is true or false
To avoid any false execution, branch-delay slot
method is applied at compiler and the slots are lled with
“nop” operations.
For Combination of all 8 basic instructions:
addi 1, 0, 5 addi 2, 0, -3 add 3, 1, 1 add 5, 2, 2
add 3, 3, 1 add 5, 5, 2 add 3, 3, 1 nand 5, 5, 5
add 3, 3, 1 addi 5, 5, 1 add 4, 3, 3 add 6, 5, 5
add 6, 6, 5 add 7, 6, 4 sw 7, 0, 3 lui 1, 77
addi 2, 0, -1 addi 3, 0, 19 lw 4, 0, 3 beq 4, 1, 5
add 0, 0, 0 add 4, 4, 2 jalr 6, 3 add 0, 0, 0
add 0, 0, 0 sw 2, 0, 0.
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology
6
Design and Implementation of a Five Stage Pipelining Architecture Simulator for RiSC-16 Instruction Set
Translates as:
Perform 2x2 + 3y2 with x = 5 and y = -3
Store the result in data memory at location 3.
Decrement the result until it equals 64
en store to R2 value to data memory at location 0
Figure 5. Main user interface of the simulator.
Figure 6. “Write New Instruction” window.
Rashidah F. Olanrewaju, Fawwaz E. Fajingbesi, S. B. Junaid, Ridzwan Alahudin, Farhat Anwar and Bisma Rasool Pampori
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology 7
Figure 7. Operation with occurrence of data dependency.
Figure 8. Operation with bypassing method.
Figure 9. Register values before the “beq” operations.
Figure 10. Register values aer all operations.
Figure 11. Registers and data memory aer computation.
Figure 12. Registers and data memory aer completion of
all instructions.
4. Conclusion
is research has presented a novel design and
implementation for a pipeline instruction set simulator
using Visual Basic. e simulator oers visualization
hence proper understanding of the concept and processes
involved in pipelined instruction cycle for a RiSC-16
processor. It is capable of simulating and solving data
dependency and control hazard experienced in real
processors which is lacking in most available simulators
as ascertained by review; and is however a key component
in pipeline concept. e solutions to this hazard are also
within the simulator. When users implement bypassing
and branch delay techniques to counter hazards, the
process can be visualized within the simulator. e
designed simulator is also equipped to operate at three
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology
8
Design and Implementation of a Five Stage Pipelining Architecture Simulator for RiSC-16 Instruction Set
dierent clock speeds, accept up to forty instructions at
once in three dierent user friendly ways (ASM, Text and
Predened Template), thereby providing its user with the
best learning environment for computer architecture and
processor pipelining.
5. Acknowledgement
is work was partially supported by Ministry of Higher
Education Malaysia (Kementerian Pendidikan Tinggi)
grants under numbers FRGS15-254-0495 and RIGS16-
084-0248.
Table 1. Literature review summary
Author Method Merits and Comment
Current work Built using Visual
Basic programming
for RiSC16 processor
architecture
• Simple design concept
• User-friendly with no professional
skills required as instruction set
can be in plain text, ASM or RiSC
key word
• Five stage pipeline for higher
eciency
• Capable of Simulating Data and
control Hazards with their
solutions
• Fully capable of handling 5 stage
pipelined architecture of RiSC-16
Processor
• Capable of handling up to 40
instructions
• Visualization for better
understanding of pipeline concept
• Perfect for educational purpose
Rakesh et al.2Built Using Verilog HDL
on a predesigned utility
simulator (ModelSim)
and Xilinx ISE tools
• Modelled at architecture level
• Intended for educational purposes
• Requires professional skill to
operate
• Simulates through pre-published
soware as no dedicated simulator
was built from scratch.
• Designed with focus on handling
four stage pipelining
• No provision for executing and
handling hazards which are crucial
in understanding and perfecting
pipeline concept
Oséeet al.9Built Using Java for
RiSC16 processor archi-
tecture
• Designed from scratch
• Capable of only 4-stage pipelined
architecture of RiSC16 processor
• Intended for educational purposes
• No provision for executing and
handling hazards which are crucial
in understanding and perfecting
pipeline concept
Hongansonset al.7,10 Built a Web based simu-
lation model
• Complex design
• Supports multiple architecture
• Capable of handling ve stage
pipelining
• Speed calculation capabilities
• Intended for educational purposes
• No provision for executing and
handling hazards which are crucial
in understanding and perfecting
pipeline concept
Grünbacheret al.8Built Using C++
programming for DLX
processor (WinDlx) in
MS-Windows (16 bit)
• Target at pipeline simulator for the
DLX processor (WinDlx)
• Modelled at architecture level
• Intended for educational purpose
• No provision for executing and
handling hazards which are crucial
in understanding and perfecting
pipeline concept
Rashidah F. Olanrewaju, Fawwaz E. Fajingbesi, S. B. Junaid, Ridzwan Alahudin, Farhat Anwar and Bisma Rasool Pampori
Vol 10 (3) | January 2017 | www.indjst.org Indian Journal of Science and Technology 9
6. References
1. Rakesh MR, Ajeya B, Mohan AR. Novel architecture of 17
bit address RISC CPU with pipelining technique using Xil-
inx in VLSI Technology. International Journal of Engineer-
ing Research and Applications.2014; 4(5):116–121.
2. Rakesh MR.Design and simulation of four stage pipelining
architecture using the Verilog. International Journal of Sci-
ence and Research. 2014; 3(3):108–12.
3. Rana S, Mehra R. Design &simulation of RISC processor
using hyper pipelining technique. IOSR Journal of Mechan-
ical and Civil Engineering (IOSR-JMCE). 2013; 9(2):49–57.
4. Trivedi P. Design &analysis of 16 bit RISC processor us-
ing low power pipelining. 2015 International Conference
on Computing, Communication & Automation (ICCCA);
2015:1294–7.
5. Finlayson I, Uh GR, Whalley DB, Tyson G. An overview
of static pipelining. IEEE Computer Architecture Letters.
2012; 11(1):17–20.
6. Cheah HY, Fahmy SA, Kapre N. Analysis and optimization
of a deeply pipelined FPGA so processor. 2014 Interna-
tional Conference on Field-Programmable Technology
(FPT); 2015. p. 235–8.
7. Hoganson KE. High-performance computer architecture
and algorithm simulator. Journal on Educational Resources
in Computing. 2002; 2(1):131–48.
8. Grunbacher H. Teaching computer architecture/organisa-
tion using simulators. 28th Annual Frontiers in Education
Conference, FIE’98. Treitlstrasse Vienna Austria. 1998;
3:1107–12.
9. Osée M, Richard A, Biest AV, Mathys P. Educational simu-
lation of the RiSC processor. International Conference on
Engineering Education(ICEE 2007); 2007.
10. Hoganson K. e unied parallel speedup model and sim-
ulator. Southeast Regional ACM Conference; 2001. p. 1–23.
11. Balasubramonia R. CS6810 computer architecture. Univer-
sity of Utah: Youtube; 2012.
12. Jacob PB. e pipelined RiSC-16. ENEE 446: Digital Com-
puter Design, Fall 2000; 2000. p. 1–9.
... Currently, the VLSI digital systems design overburdened with many complex features. Multitasking, parallelism makes the system slow and consumes more power to meet these customer requirements; and designers have to compromise with the critical factors [1]. ...
Article
Full-text available
Pipelining is a technique that exploits parallelism, among the instructions in a sequential instruction stream to get increased throughput, and it lessens the total time to complete the work. . The major objective of this architecture is to design a low power high performance structure which fulfils all the requirements of the design. The critical factors like power, frequency, area, propagation delay are analysed using Spartan 3E XC3E 1600e device with Xilinx tool. In this paper, the 32-bit MIPS RISC processor is used in 6-stage pipelining to optimize the critical performance factors. The fundamental functional blocks of the processor include Input/Output blocks, configurable logic blocks, Block RAM, and Digital clock Manager and each block permits to connect to multiple sources for the routing. The Auxiliary units enhance the performance of the processor. The comparative study elevates the designed model in terms of Area, Power and Frequency. MATLAB2D/3D graphs represents the relationship among various parameters of this pipelining. In this pipeline model, it consumes very less power (0.129 W),path delay (11.180 ns) and low LUT utilization (421). Similarly, the proposed model achieves better frequency increase (285.583 Mhz.), which obtained better results compared to other models.
Chapter
RISC V architecture is finding its importance with semiconductor industry and academia. With the availability of open instruction, set design of the processor is possible. The RTL needs an extensive verification. Simulation-based methods are rampant, but exhaustive test generations are required. The papers reports design and System Verilog verification of the five-stage RISC V processor. Mentor Questa simulator is used to verify the design. The code coverage reported is 80%.
Chapter
The main aim is to implement 128-bit RISC processor using pipelining techniques through FPGA with the help of von Neumann architecture. With the increase in the use of the FPGA in various embedded applications, there is a need to support processor designs on FPGA. The type of processor proposed is a soft processor with a simple instruction set which can be modified according to use because of the reconfigurable nature of FPGA. The type of architecture implemented is von Neumann. Prominent feature of the processor is pipelining which improves the performance considerably such that one instruction is executed per clock cycle. Due to the increase in innovations in the development of processors, the increasing popularity of open source projects like RISC-V ISA (Instruction Set Architecture), there is a need to also rapidly understand these designs and also upgrade them which can easily be performed on FPGA with trade off in speeds and size as compared to commercial ASIC processors, and hence, we are motivated to understand these systems. In this paper, a 128-bit RISC processor is implemented using FPGA pipelining.KeywordsRISC—reduced instruction set computerFPGA—field programmable gate arrayISA—instruction set architectureASIC—application specific integrated circuit
Chapter
Full-text available
In cloud computing technology, task scheduling is one of the research challenges. For these various algorithms, works such as particle swarm optimization (PSO), firefly algorithm, ant colony optimization (ACO) and genetic algorithm (GA). PSO is inspired by the bird’s movement, and ACO is based on the behaviour of ants. GA works based on the natural evolution process. This paper presents the hybrid of PSO-ACO-GA for task scheduling on virtual machines of cloud computing known as ant particle swarm genetic algorithm (APSGA). Here, GA and PSO will perform iteration to get the task basis on fitness value and further ACO will distribute the task on specific virtual machines. This paper has achieved improved results for parameters such as CPU utilization, makespan and execution time. Our proposed algorithm has achieved makespan that is reduced by 27.1%, 19.45% and 21.24% with compare to PSO, ACO and GA, respectively. It has achieved maximum of CPU utilization and execution time.
Chapter
RISC-V is a free and open instruction set architecture (ISA) based on reduced instruction set computer (RISC) principles. RISC-V ISA enables a new phase in the field of processors through open standard association. The address of RISC-V is based on 32-bit and 64-bit variants. The essential RISC-V is a 32-bit integer instruction set defined as RV32I, which efficiently supports the operating system environments and also suits for the embedded system applications. In this paper, a survey is carried for 5-stage in-order pipeline implementation and ways to overcome pipelining hazards for structural hazards, data hazards, and control hazards on RISC-V processors. Being open-source and free, this is adopted in many commercial and academic research and projects.
Chapter
Natural language interfaces are gaining popularity as an alternative interface for non-technical users. Natural language interface to database (NLIDB) systems have been attracting considerable interest recently that are being developed to accept user’s query in natural language (NL), and then converting this NL query to an SQL query, the SQL query is executed to extract the resultant data from the database. This Text-to-SQL task is a long-standing, open problem, and towards solving the problem, the standard approach that is followed is to implement a sequence-to-sequence model. In this paper, I recast the Text-to-SQL task as a machine translation problem using sequence-to-sequence-style neural network models. To this end, I have introduced a parallel corpus that I have developed using the WikiSQL dataset. Though there are a lot of work done in this area using sequence-to-sequence-style models, most of the state-of-the-art models use semantic parsing or a variation of it. None of these models’ accuracy exceeds 90%. In contrast to it, my model is based on a very simple architecture as it uses an open-source neural machine translation toolkit OpenNMT, that implements a standard SEQ2SEQ model, and though my model’s performance is not better than the said models in predicting on test and development datasets, its training accuracy is higher than any existing NLIDB system to the best of my knowledge.
Conference Paper
Full-text available
FPGA soft processors have been shown to achieve high frequency when designed around the specific capabilities of heterogenous resources on modern FPGAs. However, such performance comes at a cost of deep pipelines, which can result in a larger number of idle cycles when executing programs with long dependency chains in the instruction sequence. We perform a full design-space exploration of a DSP block based soft processor to examine the effect of pipeline depth on frequency, area, and program runtime, noting the significant number of NOPs required to resolve dependencies. We then explore the potential of a restricted data forwarding approach in improving runtime by significantly reducing NOP padding. The result is a processor that runs close to the fabric limit of 500MHz with a case for simple data forwarding.
Article
Full-text available
A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance.
Article
A 16 bit low power pipelined RISC processor is proposed by us in this paper, the RISC processor consists of the block mainly ALU, Universal shift register and Barrel Shifter. We have used modified Harvard architecture that uses separate memories for its instruction & data memory response where as in the other architecture by von Neumann, has only one shared memory for instruction and data, with one data bus and address bus with between data memory & processor memory. The remedial architectural modification has been made in incremental circuit utilized in carry select adder unit of the ALU in the RISC Processor. Operation in the core RISC Processor Fetch, Decode, execute, write back is implemented in the 2 stage pipelining with the positive edge & negative Edge. The process has been realized using XILINX ISE Design suit 13.2 & the Dynamic power is minimized in the RISC Core through the clock gating technique that is an efficient power technique and the total power estimation is done by the X Power analyzer. All the implementation is done in XILINX KINTEX XC7K1607-3fbg676 in it kit 28 nm technology are used. The simulation illustrate the total power dissipated by the processor to be 0.220 watt, and the Latency is 1.5 cycle.
Article
This paper develops a unified parallel processing speedup model, that integrates parallel processing models from pipelines within the CPU to clustered and distributed multi−computers. The different software/algorithm parallelism models are analyzed at each level of parallelism, as well as the hardware architecture developed to capitalize on the potential speedup. By integrating the different levels of parallelism in a single unified model through the use of encapsulation, researchers and designers are able to explore the state−space of possibilities and look for optimal performance returns with minimal hardware resources. Multiple levels of parallelism represent subdivisions of a parallelism continuum where performance can be scaled by adding additional levels of parallel architecture when supported by the workload. The paper also presents and discusses a simulation model developed that implements the unified speedup model. The simulation model is a Java applet posted to the web and available for experimentation. This work extends previous work that unified Amdahl's classic parallel speedup model, with process scaling and a new workload parallel model [7], additionally integrating pipeline speedup, super−scalar speedup, and a new n−Tier client−server distributed multi−computer parallel speedup.
Article
This simulation tool allows the user to explore different computer architectures with hardware support at any or all of five levels of parallelism, from intrainstruction (pipeline) through distributed n-tier client/server systems. The tool supports the simulation of various user-configurable architectures and interconnection networks, running a user-configurable and variable workload. This allows the student and the instructor to observe how performance changes through the five levels of parallelism with changes in either the architecture or workload. The successful use of the simulation tool in a variety of undergraduate courses at the author's institution is presented, along with examples, and a set of experiments. The simulator is a Java applet, which can be used from a Web browser, allowing anyone with an Internet connection access to the tool, without concern about student licensing requirements. The simulator is hosted at the author's institution with funding provided by a recent grant. Its design as an applet also allows improvements and enhancements to the software to be implemented and instantly made available to all users of the product.
Novel architecture of 17 bit address RISC CPU with pipelining technique using Xilinx in VLSI Technology
  • M R Rakesh
  • B Ajeya
  • A R Mohan
Rakesh MR, Ajeya B, Mohan AR. Novel architecture of 17 bit address RISC CPU with pipelining technique using Xilinx in VLSI Technology. International Journal of Engineering Research and Applications.2014; 4(5):116-121.
Design and simulation of four stage pipelining architecture using the Verilog
  • M R Rakesh
Rakesh MR.Design and simulation of four stage pipelining architecture using the Verilog. International Journal of Science and Research. 2014; 3(3):108-12.
Design &simulation of RISC processor using hyper pipelining technique
  • S Rana
  • R Mehra
Rana S, Mehra R. Design &simulation of RISC processor using hyper pipelining technique. IOSR Journal of Mechanical and Civil Engineering (IOSR-JMCE). 2013; 9(2):49-57.
Teaching computer architecture/organisation using simulators. 28th Annual Frontiers in Education Conference, FIE'98
  • H Grunbacher
Grunbacher H. Teaching computer architecture/organisation using simulators. 28th Annual Frontiers in Education Conference, FIE'98. Treitlstrasse Vienna Austria. 1998; 3:1107-12.
Educational simulation of the RiSC processor
  • M Osée
  • A Richard
  • A V Biest
  • P Mathys
Osée M, Richard A, Biest AV, Mathys P. Educational simulation of the RiSC processor. International Conference on Engineering Education(ICEE 2007); 2007.