Conference PaperPDF Available

A guideline for rapid development of assembler to target tailor-made microprocessors



Content may be subject to copyright.
A guideline for rapid development of assembler to
target tailor-made microprocessors
Ehsan Ali
Department of Electrical Engineering
Chulalongkorn University
Bangkok, Thailand
Wanchalerm Pora
Department of Electrical Engineering
Chulalongkorn University
Bangkok, Thailand
Abstract—The emergence of FPGA technology and rapid
advances in EDA tools and HDL languages enables engineers
to design the hardware for general purpose or application
specific microprocessors swiftly. However, its needed software
suite cannot be developed as quickly. In this paper a guideline,
based on LLVM infrastructure, for rapid development of an
assembler is suggested. To demonstrate that, a simple 16-bit
microprocessor, called Laser, is implemented on an FPGA.
The emergence of FPGA technology and advances in EDA
tools enables engineers to design the hardware for general-
purpose or application-specific microprocessors swiftly. Af-
ter designing the processor, using an HDL language and
synthesizing it into an FPGA device, the next step is to
develop a software toolchain for it. In the old days, this
may be done by writing the whole toolchain from scratch
with the traditional C/C++ compiler. This practice is very
much time-consuming and its result can hardly be reused due
to its non-modularity nature. The better way is to develop
the toolchain with cross compilers/assemblers such as GCC
[1], AXASM [2], or TDASM [3]. This approach is less
time-consuming but the non-modularity problem still persists.
The LLVM compiler project [4] was founded to solve both
the modular and reusable issues. Do not be misled by its
name, it is not just a compiler, but an infrastructure that is
composed of several software development tools. The project
releases common and platform-independent front-ends. One
has to develop only corresponding platform-dependent back-
ends in order to produce software suite that supports the newly
developed microprocessor hardware. With this approach the
back-ends can be ported to suit future microprocessors with
little effort.
There exist only two major infrastructures which provide
extensible back-ends: (1) the well-known GNU Compiler
Collection (GCC) and (2) the LLVM Infrastructure. Both of
them support many processors such as i386, IA64, MIPS,
SPARC, ARMv7, and AVR [5]. Although GCC is older and
more well-known, both infrastructures have reached a high
level of maturity. Besides supporting major CPU architectures,
several toolchains have been ported to target tailor-made CPU
platforms. For examples, Chen Chung-Shu created several
LLVM back-ends for a simplified 32-bit RISC processor,
named Cpu0 [7]. Its software suite consists of a compiler
an assembler and a disassembler. Chen reused some modules
from MIPS. Earlier, Christoph developed LLVM back-ends for
TriCore architecture [8], which was the principal reference of
Cpu0. Simon developed LLVM back-ends for OpenRISC 100
All the mentioned works attempted to create full-blown
LLVM back-ends for their target processors. This means, apart
from implementing pure back-ends, the engineer must imple-
ment extra parts that combine with a third-generation language
(such as C/C++). They include function call argument passing,
frame lowering, arithmetic and logic instructions, control flow
statements, etc. So the engineers still need some time to
develop a back-end. In contrast, this paper is trying to decouple
all the links in order to reduce the development time of a
back-end. By taking an advantage of the available LLVM MC
(Machine Code) framework, we suggest a guideline to produce
only an assembler; while retaining the possibility of adding the
other tools in the future.
To demonstrate the software development steps, a 16-bit
processor, called Laser, is developed. The design emphasizes
on minimizing hardware resource, so that it can be imple-
mented on small CPLD or FPGA devices. The hardware
design is composed of 2000 lines of HDL code approxi-
mately, and is available in public domain at
ehsan-ali-th/laser. The ease of its assembler is considered in
designing its instruction set. It contains 31 instructions. Apart
from one 32-bit instruction, the others are 16-bit long. Having
said that the variable width instruction and the support of
several addressing modes makes the assembler sophisticated
enough to be a role model for future RISC processor designs.
A. LLVM Back End Pipeline
The most important thing about LLVM is that it is designed
as a collection of libraries [17]. This let us to skip the front-
ends, and optimization libraries, and focus solely on the back-
ends. LLVM has a pipeline structure, where instructions travel
through several phases as shown in Fig. 1.
Initially the instructions are in IR (Intermediate Representa-
tion) form which uses SSA (Statis Single Assignment) prop-
978-1-5386-3555-1/18/$31.00 c
2018 IEEE
Fig. 1. Back End Pipeline. [11]
erty and as they go through each pass they get converted from
one C++ class to another one: LLVM IR SelectionDAG
MachineDAG MachineInstr MCInst.
The input to LLVM MC framework is the MCInst class and
the output is the machine specific object code.
The Laser microprocessor is a 16-bit processor with fixed-
size 16-bit instruction width and 16-bit memory address, and
16-bit data bus. It is a RISC like architecture designed to be
fitted into low-end FPGA devices targeting embedded systems.
A. Instruction Encoding
The bit encoding for Laser 16-bit instruction set is shown
in Table I. (Instructions with similar format has been omitted
to save space)
B. Instruction Description
The opcode field uses 5-bits which enables the accommo
dation of 32 distinct instructions. There are three operands:
RD (4-bits width), RS (4-bits width), RT (3-bits width). The
total number of registers is 16. Eight of them are general
purpose registers: R8 to R15 and the remaining eight are
special registers.
The supported addressing modes are: (1) Immediate [11-
bits]. (2) Displacement [11-bits] (3) Register Indirect [16-bits].
The description of each instruction is shown in Table II.
A. Getting The LLVM Infrastructure
At the time of writing this paper LLVM version is 6.0.0.
In this paper we will omit the source codes and only discuss
extremely important mechanisms. The location of all files are
relative to LLVM ROOT which is the directory that LLVM
source code resides. We will get LLVM source code plus Clang
compiler by issuing the following commands:
No. Instruction Opcode Destination Reg. Source Reg. Target Reg.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 MOV 0 0 0 0 0 RD RS
1 ADD 0 0 0 0 1 RD RS RT
5 INC 0 0 1 0 1 RD
18 JMP 1 0 0 1 0 * * *
23 IMD 1 0 1 1 1 RD
26 CLRC 1 0 1 0 1
No. Instruction Description
1 ADD RT (RS + RD)
5 INC RD (RD + 1)
18 JMP Unconditional Short Jump to PC+IR[10-0], IR[10-0]
must be in two’s complement form.
23 IMD RD Next 16-bit. (2 Cycles)
26 CLRC Clears the Carry Flag.
$ svn co
$ cd llvm/tools
$ git clone
B. Target Registration
1) LLVM Target Registration: First edit the following files
(the path is relative to “LLVM ROOT”):
CMakeLists.txt cmake/config-ix.cmake
lib/Support/Triple.cpp lib/Target/LLVMBuild.txt
lib/Target/LLVMBuild.txt lib/Object/ELF.cpp
We then create the folder LLVM ROOT/lib/Target/Laser and
create the the following files which is the bare-bones needed
to support an assembler inside it:
LLVMBuild.txt LaserTargetMachine.cpp and .h
LaserISelLowering.cpp and .h LaserInstrInfo.cpp and .h
LaserMCInstLower.cpp and .h LaserFrameLowering.cpp and .h
LaserISelDAGToDAG.cpp and .h LaserRegisterInfo.cpp and .h
2) Front End: Clang 16-bit Support: The back-end devel-
opment can be started after the details of the target processor
is known. The LLVM is an enormous project and if we try
to understand all components and then start to develop the
back-end we will lose the “rapid development” characteristic.
We will consider the front end as a black box. It is the job
of the C compiler (Clang) to get the source code and send it
to the optimization stage. Since the Laser processor is a 16
bit machine we have to tell the Clang to generate 16-bit IR
instruction from C source code.
We create two new files at
LLVM ROOT/tools/clang/lib/Basic/Targets/Laser.h
LLVM ROOT/tools/clang/lib/Basic/Targets/Laser.cpp
And add Target Laser to Clang by editing:
LLVM ROOT/tools/clang/lib/Basic/Targets.cpp:
LLVM ROOT/tools/clang/lib/Basic/CMakeLists.txt
At this point we can compile the llvm project using the
following command from an empty build directory:
$ cmake3 -G ”Ninja” -DCMAKE BUILD TYPE =
/path/to/llvm/source/code && ninja && ninja install
Now Clang can produce 16-bit LLVM code:
$ clang –target=laser -S -emit-llvm main.c -o main.ll
which reads main.c file and outputs 16-bit LLVM IR in
main.ll file.
To plug the components together and show their relationship
we must resort to a diagram that tracks the life of a sampled
ret instruction through the LLVM back-end pipeline. [14]
Suppose we have the C code such as:
int main () {return 0; }
Running clang -O0 -target=laser -S -emit-llvm main.c -o
main.ll will output main.ll with LLVM IR “ret i16 0”. From
there the LLVM IR will be transformed to many forms as
shown in Fig. 2. The gray background boxes show the ret
instruction in different forms as it travels in each back-end
Fig. 2. The life of Laser ret instruction.
The main goal of this paper is the rapid development of a
modular assembler, therefore we go for a novel approach and
only support these three main components: (1) “function calls”
which will be lowered to Laser CALL instruction (argument
passing, frame lowering, etc. will be ignored). (2) “inline
assembly support” using asm() directive. (3) “labeling and
goto” support which will be lowered to Laser JMP instruction.
These components will enable us to use this code structure:
v o i d c o u n t ( ) {
as m ( ” i md %r 8 , # 0 ” ) ; / / P u t 0 i n t o R8
as m ( ” i md %r 9 , # 1 ” ) ; / / P u t 1 i n t o R9
as m ( ” i md %r 10 , # 0 ” ) ; / / P u t 0 i n t o R 10
start :
asm ( a dd %r8 , %r9 , %r 8 ) ; / / R8 = R8 + R9
as m ( ” o u t %r1 0 , %r 8 ) ; / / o u t R8 t o p o r t 0
g o to s t a r t ;
i n t mai n ( ) {c o un t ( ) ; r e t u r n 0 ; }
A. Function call
Below is the complete detail of every step that a C function
call instruction goes through
1) “count ();” C statement gets translated to LLVM IR:
“call void @count() by Clang.
2) Then LowerCall() will be called to store
outgoing arguments in caller function.
There we create LASERISD::LaserCall with
TargetGlobalAddress:i16<void ()* @count>operand.
We also set the operand flag to LaserII::MO CALL
FLAG. This is the starting point of saving the call
target address as a static relocation in an ELF file.
3) We have defined CALL as an instance of F3 class
which is derived from FJ class, to match the pat-
tern (LaserCall imm:$target). LaserCall is an SDNode
with “LASERISD::LaserCall” as its opcode defined in
4) After legalization, in instruction selection phase we
match and replace:
def : Pat<(LaserCall (i16 tglobaladdr:$dst)), (CALL
(IMD tglobaladdr:$dst))>;
5) At the end, after register allocation and instruction
scheduling we have (the form can be examined by
looking at the output of “llc -print-machineinstrs or -
renamable $r10 = IMD @count CALL killed renamable
$r10, <regmask $r8 $r9>, implicit-def $sp
6) Now the instruction is in MachineInstr form.
7) From now on we enter the MC framework. We
lower MachineInstr operands in LaserMCInstLower.cpp
by calling LaserMCInstLower::LowerSymbolOperand().
For “call” instruction the operand is Machine-
Operand::MO GlobalAddress, so we set the Sym-
bol value to AsmPrinter.getSymbol(MO.getGlobal());
and we set TargetKind = LaserMC-Expr::VK LASER
8) The getMachineOpValue()function calls getExprOp-
Value() in MCTargetDesc//LaserMCCodeEmitter.cpp
when the operand is not immediate or register, which
then it saves the “fixup Laser CALL16” and returns 0.
9) If we want to emit .s file we use llc -march laser
-mcpu=generic -filetype=asm -o main.s main.ll com-
mand. This will invoke LaserAsmBackend::applyFixup()
for “fixup Laser CALL16” at provided offset and then
LaserInstPrinter::() writes the instructions into .s file.
10) If we want to emit .o file we use llc -march laser -
mcpu=generic -filetype=obj -o main.o main.ll which en-
vokes LaserMCCodeEmitter::EmitInstruction(). There,
if the instruction operand is an LaserMCExpr of type
VK LASER CALL16, it will allocate space in object
file and will write 0 and add the “fixup Laser CALL16”
in relocation table of the output ELF file.
11) Finally by linking the object files using lld (which is
another tool available under LLVM project umbrella) the
correct value of target call address will be calculated,
and will be rewritten into the proper offset associated
with the recorded relocation symbol.
B. Inline Assembly
In order to support inline assembly we need to add Asm-
Parser module:
- Create LLVM ROOT/Laser/AsmParser/
The LaserAsmParser class is derived from MCTargetAsm-
Parser class. The class has MatchAndEmitInstruction() func-
tion which will be called for each instruction that needs to
be parsed. It then emits the binary representation of each
instruction. There are other supporting functions which help
to parse the operands and emit the proper machine code.
C. Label, Jump, and Goto
Clang will convert a goto keyword in C source file into
“br label %label name LLVM IR. We simply match (br
bb:$address) pattern with Laser JMP instruction with an
incoming argument of type LASERjmptarget11. In LaserIn- file LASERjmptarget11 has been defined with
OperandType = “OPERAND PCREL” property and Enco-
derMethod = “getJumpTarget11OpValue”. The getJumpTar-
get11OpValue() function is defined in MCTargetDesc/LaserM-
CCodeEmitter.cpp and it adds “fixup Laser PC11” in relo-
cation table of the output ELF file, and write 0 on address
field (11-bits) of the jump instruction. “lld” tool recognizes
“fixup Laser PC11” as a PC relative address and calculates
the final jump address and rewrites it into the executable ELF
D. Linker
To add lld support we first download the source code for
$ cd l l vm / t o o l s
$ s vn co h t t p : / / l lvm . o rg / s vn / l lv mproject /
l l d / t ru nk l l d
Then we change the following files in
LLVM ROOT/Laser/lld/ELF/ to add ELF support for
the Laser processor to lld tool (LLD 7.0.0):
- Driver.cpp - Target.h - CMakeLists.txt - Laser.cpp.
After recompiling LLVM we use the following command
to get the a.out file:
$ l d . l l d e main main . o
The complete tool-chain consist of the following stages (the
associated output file formats are mentioned in parentheses):
Clang (LLVM IR) llc (.o object ELF) lld
(a.out executable ELF) FPGA RAM Block [15]
Clang produces the LLVM IR code and llc generates ELF
object files with static relocations. lld resolves the relocations
in the object files and produces an executable a.out file. Finally
we extract the machine code using
$ elf2hex -arch-name=laser >a.hex
Next hex2coe (written by the author in C language) gener-
ates an a.coe file. Then we load the .coe file into the FPGA
ROM Block for execution by generating a bit stream file.
In this paper a novel approach for rapid development of
a modular assembler has been proposed. An LLVM back-
end for the 16-bit Laser soft processor has been developed.
The processor design in conjunction with the LLVM back-end
provides the possibility of coming up with a new processor
design and compare its performance by running compact
benchmarking programs written in assembly language. The
complete Laser back-end source code can be found online at
We would like to thank Prof. Ekachai Leelarasmee and
Asst. Prof. Kittiphan Techakittiroj for their ceaseless support
during the period that this research was conducted. We also
would like to thank the Chulalongkorn University for granting
the “100th Anniversary Chulalongkorn University Fund for
Doctoral Scholarship” to the author.
[1] GCC, the GNU Compiler Collection:
[2] axasm by Al Williams, A universal cross assembler:
[3] Table Driven Assembler: niki/tdasm/
[4] The LLVM Compiler Infrastructure:
[5] Status of Supported Architectures from Maintainers’ Point of View:
[6] ”LLVM Language Reference Manual”
[7] The LLVM Compiler Infrastructure:
[8] Design and Implementation of a TriCore Backend for the LLVM
Compiler Framework. Christoph Erhardt. https://wwwcip.informatik.uni- sicherha/foo/tricore-llvm.pdf
[10] Simon Cook ”Howto: Implementing LLVM Integrated Assembler A
Simple Guide” Application Note 10. Issue 1 October 2012 Copyright
2012 Embecosm Limited
[11] Bruno Cardoso Lopes, Rafael Auler ”Getting Started with LLVM Core
Libraries”, Production reference:1200814, August 2014, ISBN 978-1-
[12] ”The LLVM Target-Independent Code Generator”
[13] ”TableGen Language Reference”
[14] Eli Bendersky ”Life of an instruction in LLVM”
[15] ”Tutorial: Creating an LLVM Toolchain for the Cpu0 Architecture”,
[16] Mayur Pandey, Suyog Sarda ”LLVM Cookbook”, 2015.
[17] Mayur Pandey, Suyog Sarda ”LLVM Essentials”, 2015.
ResearchGate has not been able to resolve any citations for this publication.
Howto: Implementing LLVM Integrated Assembler A Simple Guide
  • Simon Cook
Simon Cook "Howto: Implementing LLVM Integrated Assembler A Simple Guide" Application Note 10. Issue 1 October 2012 Copyright 2012 Embecosm Limited
Getting Started with LLVM Core Libraries
  • Cardoso Bruno
  • Rafael Lopes
  • Auler
Bruno Cardoso Lopes, Rafael Auler "Getting Started with LLVM Core Libraries", Production reference:1200814, August 2014, ISBN 978-1-78216-692-4 [12] "The LLVM Target-Independent Code Generator"
  • Mayur Pandey
  • Suyog Sarda
Mayur Pandey, Suyog Sarda "LLVM Essentials", 2015.
Life of an instruction in LLVM
  • Eli Bendersky
A universal cross assembler
  • Al Williams
Tutorial: Creating an LLVM Toolchain for the Cpu0 Architecture
  • Tutorial