Content uploaded by Khaled Elleithy
Author content
All content in this area was uploaded by Khaled Elleithy on Oct 04, 2017
Content may be subject to copyright.
Analysis and Implementation of Proficient Rijndael Algorithm with
Optimized Computation
Susrutha Babu Sukhavasi, Suparshya Babu
Sukhavasi, Khaled Elleithy
Department of Computer Science and Engineering
University of Bridgeport, Bridgeport,
USA.ssukhava@my.bridgeport.edu,
susukhav@my.bridgeport.edu, elleithy@bridgeport.edu
Abdelrahman Elleithy
Department of Electrical Engineering and Computer Science
Texas A&M UniversityKingsville
Kingsville, Texas
Abdelrahman.Elleithy@tamuk.edu
Abstract—Security is the major aspect in any type of
communication systems. The generation of secret keys randomly
provides good security as well as better complexity in
cryptographic based algorithms. In literature there exist
symmetric key algorithms which use the same key for encryption
as well as decryption. Among all the algorithms, Advanced
Encryption Standard (AES) is used in many fields of security in
communications. In this paper, we have implemented 128 bit
AES algorithm in hardware descriptive languages. Our proposed
implementation has demonstrated efficient results in delay and
area. Furthermore, we provide a comparison with the
corresponding coding implementations.
Keywords—Advanced Encryption Standard (AES): Decryption,
and Encryption, FPGA implementation, RTL.
I. INTRODUCTION
To prevent the loss of information and avoid cyber
attacks, individuals have to adopt necessary precautions to
decrease the cybercrime. This can be achieved only by highly
consistent communication over the internetwork, firm and
compact encryption techniques[1] and trustworthy outer party
for recording the data. Protection should be improved for
messages from malicious attacks. In order to transfer the data
or to provide the vital information for the clients or users over
the universal network like internet, “cryptography” is utilized.
This assures that only the intended recipients’ can read or
access that particular data. Obtaining the required protection
by encrypting the information is done usually by safety
procedures which are dependent upon the system’s hardware
and software. Moreover, hackers use the viral code to make
the data sullied and nonreproducible by making the entire
system destroyed. Hence, it will be more efficient to utilize
hardware implementation rather than the software to defend
the system from the disastrous threats caused due to the
malicious softwares.
The employment of hardware increases the area which
should be diminished for best results. Advanced Encryption
Standard (AES) was identified as the most suitable scheme for
implementation in VLSI [2]on by analyzing all the
cryptographic techniques. So to create or to develop the
significant ciphers, an AES based ASICS (Application
Specific Integrated Circuits) and mutant processors are to be
contrived in the future [3]. The altered form of the Rijndael
algorithm[4] is the AES which is suitable for hardware and
software implementation. AES can be applied for most of the
processors in order to gratify the output requirements on by
not concentrating on the minimization of the power. Hence the
basic requirements in VLSI are speed increment, low power
and less area can be accomplished by implementation of the
AES algorithm in hardware description languages like VHDL
and Verilog.
The Two great Belgian Cryptographers named Vincent
Rijmen and Joan Daemen created AES[5] as a consistent
repetitive data block cipher, which is now using 128 bit block
having 3 various sizes of keys such as 128,196 and 256
bits[6]. AES is also called as Rijndael symmetric algorithm[7]
due to utilizing of the similar key for encryption and
decryption of a message on by generating the cipher text
whose size is equivalent to the plain text message size.
Depending on the width of the datapath, application of the
AES was classified into 3 kinds such as the first one which is
meant for production of the key for encrypting and decrypting
the data. Moreover, this encryption and decryption utilizes the
same key in the AES algorithm to do on the data length of
four bytes generally given in the field known as “Galois Field”
represented as GF (2
8
). This field also uses OR gate to do any
type of addition operations in it and this is more helpful in the
computations done by the computers. Later in order to
represent the polynomial, the GF (2
8
) multiplied with the
irreducible binary polynomial of degree 8 because this binary
polynomial will be having no divisors other than 1 and itself.
In AES , the encoding of the data can be done from key sizes
such as 128,192,256 bits on by utilizing the round operations
of 10,12,14 dependent key lengths . Every operation of the
round involves transformations of four such as ADD Round
key, SHIFT Rows, SUB BYTES, MIX columns. First ADD
round key will be enabled XOR operation between 128
bitkeys and the data, and executed for initialization as
displayed in the Figure 1.SHIFT rows do cyclic shifting for
each row with data of 4 bytes involving the “0 to 3” offset
values. SUB BYTES makes use of the Sbox[8, 9] which is
the substitution table used for the bytes conversion
independent of the byte current state and this is made up of
mixing 2 transformations such as GF(2
8
),a multiplicative
inverse in the finite field.
Figure 1: Advance Encryption Standard Algorithm
Last operation, the MIX columns takes the size of data as 4
bytes in every block in every column as the coefficients in a 4
term polynomial and polynomial of fixed value multiplied
with data modulo X4+1. Finally, decoding will be done to
extract the exact data.
III.RELATED WORK
The encryption as well as the decryption mode will be
happening in 10 or 12 or 14 rounds which were chosen mainly
from the block size and key length. To generate the
intermediate data, cryptography round will be used and also
separate key scheduling will be applied for the generation of
the corresponding sub round keys which was obtained from
the initial starting key. The encryption phase round has been
classified to four stages namely key addition, byte
substitution, shift row, and mix column as shown in figure
1.However in the case of decryption, the stages are different
though the sub keys for encryption as well as decryption are
identical which leads to acquire the different software for both
encryption and decryption phases and which is a drawback.
Whereas, the alternative was developed by introducing the
similar version for both phases. To achieve this equivalence,
inverse of mix column will be added next to the sub round
keys. This change makes the implementation for encryption
and decryption modes more efficient in terms of saving the
essential area.
In Privatekey cryptography, such as DES, a single key is used
for both encryption and decryption operations as shown in
Figure 2. Alternatively, in Public Key cryptography, a pair of
private and public keys is used for the encryption and
decryption operations as shown in Figure 3.
Figure 2: Privatekey cryptography
Privatekey cryptography
Figure 3: Publickey cryptography
III. PROBLEM IDENTIFICATION
Due to low cost as well as its better performance, we are
suggesting an inner pipelined [10]VLSI implementation of
AES algorithm. This architecture [11] divides the 128bits
process in each normal round to two 64bit computations
which is less expensive for hardware implementation[12]and
approximately 12000 equivalent gates. This implementation
mainly preferred to the portable devices like mobiles. Most
hardware implementations[13] of AES algorithm will process
the calculations of 128 bits simultaneously in every round[14].
However, this type of existing architectures gives high output,
with high hardware cost. In fact, AES algorithm is known for
its block calculation which then divides the 128 bit calculation
to two 64 bit computations. This scheme has reasonable
hardware implementation throughput, but only needs four Sub
Byte modules and one Mix Column module, compared with
the sixteen Sub Byte modules and four Mix Column modules
in the128 bitparallel scheme. The Shift Row module is only
composed of wires. If the computation of one 64 bit data is
done in one clock, then it will take two clocks to complete one
round calculation. Moreover, the way of implementation needs
complex Mix Column, which has to be serially connected with
Sub Byte modules to increase the critical path and decrease the
chip speed.
Encryption Decryption
Private Key Private Key
Plain Text Cipher text Plain Text
Encryption Decrypti on
Public Key Private Key
Plain Text Cipher text Plain Text
IV. MATHEMATICAL MODEL
A state array of 4x4 matrix to represent 128bit block can
be represented as shown in Figure 4.
B0 B4 B8 B12
B1 B7 B9 B13
B2 B6 B10 B14
B3 B7 B11 B15
Figure 4: State Matrix
Each column consists of 4 bytes i.e.,32 bits can be called as
word.Each round will process an input state array and
generates the corresponding output state array. Last round
produces output state array which will be rearranged into a 128
bit output block.Substitution permutation matrix is applied to
AES and each round includes byte level substitution proceeded
with word level permutations.
AES Encryption key Expansion
Key scheduling can be used to expand the key matrix having
four columns into 44 words as shown in figure 5.
B0 B4 B8 B12
B1 B7 B9 B13
B2 B6 B10 B14
B3 B7 B11 B15
Figure 5:Key Scheduler
The overall block digram of the algorithm is shown in Figure 6
which contains 10 rounds. Encryption involves 10 rounds in
which each round process four steps namely Substitute
byte,Shift rows,Mix Columns and add round key as shown in
figure 7.Before the encryption begins,the input state array is
processed with first four words of key schedule which is an
XOR operation between them except for the last round where
only substistute bytes,shift rows and mix column will be
processed with the key schedule.
AES Decryption invloves four steps,as shown inFigure 7,
which are inverse shift rows,inverse substitute bytes,add round
key and inverse mix columns.A small difference between the
processing steps of encryption and decryption is that
encryption will not have mix column step where as decryption
will not have inverse mix column in the last round.
Figure 6: Block Diagram
Figure 7: Process steps in each round
V.PROPOSED SOLUTION
To resolve the issues related to computation, pipelined
structure [6] is going to be used to complete every normal
round computation. This pipelined structure achieves high
W0
W1
W2
W3



W42
W43
calculation efficiency with low hardware cost. Inserting one 64
bit register as well as addition of other clock enable line is
enough to complete the one round computation in five clocks.
As a result, the critical path length is reduced with the increase
in speed of the chip.
In order to resolve the problem occurred, we have taken the
computation scheme where we are going to reduce the delay as
well as the input arrival time and output arrival time.
Firstly, we have implemented the 32 bit AES
algorithm[15] in hardware description language and verified
the outputs with the inputs and the time has calculated. In the
proposed technique, the Look up tables has been reduced as
well as maintaining the same buffer clocks with respect to the
existing system. The optimized computation has been achieved
by reducing the number of lookup tables utilized by the
proposed algorithm. Due to the less usage, the generation
substitute bytes will be easily done to create a state matrix in
the process of cipher text which in turn reduces the overall
delay of the system. Figures 8 to 11 shows all the
implementation details.
Table I shows a comparison between existing and proposed
implementation. The proposed technique has a better
performance in terms of area, delay and clock frequency.
Figure 8: Implementation of Algorithm for 128 bit key
computation
Figure 9: Synthesis report
Figure 10: Schematic Diagram
Figure 11: RTL Schematic
Parameter Existing Proposed
Registers 7910 2182
Slice LUT’s 6221 3778
LUTFF’s 2981 1185
Bonded
IOB’s
18 391
BUFG 1 1
Delay 3.513ns 2.615ns
Frequency 284.682MHz 100.857MHz
TABLE I: COMPARISON TABLE
VI.CONCLUSION
In the proposed design, a hardware description language was
utilized for encryption and decryption operations. The
minimum delay calculated from the implementation of the
code represents that the output parameters are obtained
perfectly. Hence, AES implementation on an FPGA can be
done with the average delay of 2.61ns respectively. This delay
can change from circuit to circuit. However, this delay
calculation can be observed as approximate. And also the
presented method decreases the number of slices, look up
tables and frequency. Accordingly, AES encryption is
implemented at low frequency compared to the existing
architectures. The proposed design can be utilized to all low
power applications which do not concern much about area.
REFERENCES
1. Gurpreet Singh, S., A Study of Encryption Algorithms
(RSA, DES, 3DES and AES) for Information Security.
International Journal of Computer Applications,
2013. 67(19).
2. Xinmiao, Z. and K.K. Parhi, Highspeed VLSI
architectures for the AES algorithm. IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, 2004. 12(9): p. 957967.
3. ElRayis, A.O., T. Arslan, and A.T. Erdogan.
Addressing Future Space Challenges using
Reconfigurable Instruction Cell Based Architectures.
in 2008 NASA/ESA Conference on Adaptive
Hardware and Systems. 2008.
4. Daemen, J. and V. Rijmen, The Design of Rijndael.
Information Security and Cryptography. springer
Verlag Berlin Heidelberg, 2002.
5. Advanced Encryption Standard. Federal Information
Processing Standards, 2001(National Institute of
Standards and Technology).
6. Shivlal Mewada, P.S., S. S. Gautam, Classification of
Efficient Symmetric Key Cryptography Algorithms.
International Journal of Computer Science and
Information Security, 2016. 14(2): p. 105109.
7. Kumar Sinha, S., V. Rangari, and K. KumarPandey,
An Enhanced Symmetric Key Cryptography
Algorithm to Improve Data Security. International
Journal of Computer Applications, 2013. 74(20): p.
2933.
8. Razi Hosseinkhani, S.H.H.S.J., Using Cipher Key to
Generate Dynamic SBox in AES Cipher System.
International Journal of Computer Science and
Security (IJCSS), 2012. 6(1): p. 19  28
9. Munusamy, K., C. Senthilpari, and D.C.K. Kho. A
low power hardware implementation of SBox for
Advanced Encryption Standard. in 2014 11th
International Conference on Electrical
Engineering/Electronics, Computer,
Telecommunications and Information Technology
(ECTICON). 2014.
10. Hodjat, A. and I. Verbauwhede, Areathroughput
tradeoffs for fully pipelined 30 to 70 Gbits/s AES
processors. IEEE Transactions on Computers, 2006.
55(4): p. 366372.
11. Chen, R.J., et al. Architecture Design of High
Efficient and Nonmemory AES Crypto Core for
WPAN. in 2009 Third International Conference on
Network and System Security. 2009.
12. J.Vijaya, d.m.r., High speed Low Cost
Implementation of Advanced Encryption Standard on
FPGA. International Journal of Electronics &
Telecommunication and Instrumentation
Engineering, 2010. 2(3): p. 17.
13. Shaji, N. and P.L. Bonifus, Design of AES
Architecture With Area and Speed Tradeoff. Procedia
Technology, 2016. 24: p. 11351140.
14. Chodowiec, P. and K. Gaj, Very Compact FPGA
Implementation of the AES Algorithm, in
Cryptographic Hardware and Embedded Systems 
CHES 2003: 5th International Workshop, Cologne,
Germany, September 8–10, 2003. Proceedings, C.D.
Walter, Ç.K. Koç, and C. Paar, Editors. 2003,
Springer Berlin Heidelberg: Berlin, Heidelberg. p.
319333.
15. Van Dyken, J. and J.G. DelgadoFrias, FPGA
schemes for minimizing the powerthroughput trade
off in executing the Advanced Encryption Standard
algorithm. Journal of Systems Architecture, 2010.
56(2–3): p. 116123.