Conference PaperPDF Available

Analysis and implementation of proficient Rijndael algorithm with optimized computation



Content may be subject to copyright.
Analysis and Implementation of Proficient Rijndael Algorithm with
Optimized Computation
Susrutha Babu Sukhavasi, Suparshya Babu
Sukhavasi, Khaled Elleithy
Department of Computer Science and Engineering
University of Bridgeport, Bridgeport,,,
Abdelrahman Elleithy
Department of Electrical Engineering and Computer Science
Texas A&M University-Kingsville
Kingsville, Texas
Abstract—Security is the major aspect in any type of
communication systems. The generation of secret keys randomly
provides good security as well as better complexity in
cryptographic based algorithms. In literature there exist
symmetric key algorithms which use the same key for encryption
as well as decryption. Among all the algorithms, Advanced
Encryption Standard (AES) is used in many fields of security in
communications. In this paper, we have implemented 128 bit
AES algorithm in hardware descriptive languages. Our proposed
implementation has demonstrated efficient results in delay and
area. Furthermore, we provide a comparison with the
corresponding coding implementations.
Keywords—Advanced Encryption Standard (AES): Decryption,
and Encryption, FPGA implementation, RTL.
To prevent the loss of information and avoid cyber-
attacks, individuals have to adopt necessary precautions to
decrease the cyber-crime. This can be achieved only by highly
consistent communication over the internetwork, firm and
compact encryption techniques[1] and trustworthy outer party
for recording the data. Protection should be improved for
messages from malicious attacks. In order to transfer the data
or to provide the vital information for the clients or users over
the universal network like internet, “cryptography” is utilized.
This assures that only the intended recipients’ can read or
access that particular data. Obtaining the required protection
by encrypting the information is done usually by safety
procedures which are dependent upon the system’s hardware
and software. Moreover, hackers use the viral code to make
the data sullied and non-reproducible by making the entire
system destroyed. Hence, it will be more efficient to utilize
hardware implementation rather than the software to defend
the system from the disastrous threats caused due to the
malicious soft-wares.
The employment of hardware increases the area which
should be diminished for best results. Advanced Encryption
Standard (AES) was identified as the most suitable scheme for
implementation in VLSI [2]on by analyzing all the
cryptographic techniques. So to create or to develop the
significant ciphers, an AES based ASICS (Application
Specific Integrated Circuits) and mutant processors are to be
contrived in the future [3]. The altered form of the Rijndael
algorithm[4] is the AES which is suitable for hardware and
software implementation. AES can be applied for most of the
processors in order to gratify the output requirements on by
not concentrating on the minimization of the power. Hence the
basic requirements in VLSI are speed increment, low power
and less area can be accomplished by implementation of the
AES algorithm in hardware description languages like VHDL
and Verilog.
The Two great Belgian Cryptographers named Vincent
Rijmen and Joan Daemen created AES[5] as a consistent
repetitive data block cipher, which is now using 128 bit block
having 3 various sizes of keys such as 128,196 and 256
bits[6]. AES is also called as Rijndael symmetric algorithm[7]
due to utilizing of the similar key for encryption and
decryption of a message on by generating the cipher text
whose size is equivalent to the plain text message size.
Depending on the width of the data-path, application of the
AES was classified into 3 kinds such as the first one which is
meant for production of the key for encrypting and decrypting
the data. Moreover, this encryption and decryption utilizes the
same key in the AES algorithm to do on the data length of
four bytes generally given in the field known as “Galois Field”
represented as GF (2
). This field also uses OR gate to do any
type of addition operations in it and this is more helpful in the
computations done by the computers. Later in order to
represent the polynomial, the GF (2
) multiplied with the
irreducible binary polynomial of degree 8 because this binary
polynomial will be having no divisors other than 1 and itself.
In AES , the encoding of the data can be done from key sizes
such as 128,192,256 bits on by utilizing the round operations
of 10,12,14 dependent key lengths . Every operation of the
round involves transformations of four such as ADD Round
key, SHIFT Rows, SUB BYTES, MIX columns. First ADD
round key will be enabled XOR operation between 128-
bitkeys and the data, and executed for initialization as
displayed in the Figure 1.SHIFT rows do cyclic shifting for
each row with data of 4 bytes involving the “0 to 3” offset
values. SUB BYTES makes use of the S-box[8, 9] which is
the substitution table used for the bytes conversion
independent of the byte current state and this is made up of
mixing 2 transformations such as GF(2
),a multiplicative
inverse in the finite field.
Figure 1: Advance Encryption Standard Algorithm
Last operation, the MIX columns takes the size of data as 4-
bytes in every block in every column as the coefficients in a 4-
term polynomial and polynomial of fixed value multiplied
with data modulo X4+1. Finally, decoding will be done to
extract the exact data.
The encryption as well as the decryption mode will be
happening in 10 or 12 or 14 rounds which were chosen mainly
from the block size and key length. To generate the
intermediate data, cryptography round will be used and also
separate key scheduling will be applied for the generation of
the corresponding sub round keys which was obtained from
the initial starting key. The encryption phase round has been
classified to four stages namely key addition, byte
substitution, shift row, and mix column as shown in figure
1.However in the case of decryption, the stages are different
though the sub keys for encryption as well as decryption are
identical which leads to acquire the different software for both
encryption and decryption phases and which is a drawback.
Whereas, the alternative was developed by introducing the
similar version for both phases. To achieve this equivalence,
inverse of mix column will be added next to the sub round
keys. This change makes the implementation for encryption
and decryption modes more efficient in terms of saving the
essential area.
In Private-key cryptography, such as DES, a single key is used
for both encryption and decryption operations as shown in
Figure 2. Alternatively, in Public Key cryptography, a pair of
private and public keys is used for the encryption and
decryption operations as shown in Figure 3.
Figure 2: Private-key cryptography
Private-key cryptography
Figure 3: Public-key cryptography
Due to low cost as well as its better performance, we are
suggesting an inner pipelined [10]VLSI implementation of
AES algorithm. This architecture [11] divides the 128bits
process in each normal round to two 64-bit computations
which is less expensive for hardware implementation[12]and
approximately 12000 equivalent gates. This implementation
mainly preferred to the portable devices like mobiles. Most
hardware implementations[13] of AES algorithm will process
the calculations of 128 bits simultaneously in every round[14].
However, this type of existing architectures gives high output,
with high hardware cost. In fact, AES algorithm is known for
its block calculation which then divides the 128 bit calculation
to two 64 bit computations. This scheme has reasonable
hardware implementation throughput, but only needs four Sub
Byte modules and one Mix Column module, compared with
the sixteen Sub Byte modules and four Mix Column modules
in the128 bit-parallel scheme. The Shift Row module is only
composed of wires. If the computation of one 64 bit data is
done in one clock, then it will take two clocks to complete one
round calculation. Moreover, the way of implementation needs
complex Mix Column, which has to be serially connected with
Sub Byte modules to increase the critical path and decrease the
chip speed.
Encryption Decryption
Private Key Private Key
Plain Text Cipher text Plain Text
Encryption Decrypti on
Public Key Private Key
Plain Text Cipher text Plain Text
A state array of 4x4 matrix to represent 128-bit block can
be represented as shown in Figure 4.
B0 B4 B8 B12
B1 B7 B9 B13
B2 B6 B10 B14
B3 B7 B11 B15
Figure 4: State Matrix
Each column consists of 4 bytes i.e.,32 bits can be called as
word.Each round will process an input state array and
generates the corresponding output state array. Last round
produces output state array which will be rearranged into a 128
bit output block.Substitution permutation matrix is applied to
AES and each round includes byte level substitution proceeded
with word level permutations.
AES Encryption key Expansion
Key scheduling can be used to expand the key matrix having
four columns into 44 words as shown in figure 5.
B0 B4 B8 B12
B1 B7 B9 B13
B2 B6 B10 B14
B3 B7 B11 B15
Figure 5:Key Scheduler
The overall block digram of the algorithm is shown in Figure 6
which contains 10 rounds. Encryption involves 10 rounds in
which each round process four steps namely Substitute
byte,Shift rows,Mix Columns and add round key as shown in
figure 7.Before the encryption begins,the input state array is
processed with first four words of key schedule which is an
XOR operation between them except for the last round where
only substistute bytes,shift rows and mix column will be
processed with the key schedule.
AES Decryption invloves four steps,as shown inFigure 7,
which are inverse shift rows,inverse substitute bytes,add round
key and inverse mix columns.A small difference between the
processing steps of encryption and decryption is that
encryption will not have mix column step where as decryption
will not have inverse mix column in the last round.
Figure 6: Block Diagram
Figure 7: Process steps in each round
To resolve the issues related to computation, pipelined
structure [6] is going to be used to complete every normal
round computation. This pipelined structure achieves high
calculation efficiency with low hardware cost. Inserting one 64
bit register as well as addition of other clock enable line is
enough to complete the one round computation in five clocks.
As a result, the critical path length is reduced with the increase
in speed of the chip.
In order to resolve the problem occurred, we have taken the
computation scheme where we are going to reduce the delay as
well as the input arrival time and output arrival time.
Firstly, we have implemented the 32 bit AES
algorithm[15] in hardware description language and verified
the outputs with the inputs and the time has calculated. In the
proposed technique, the Look up tables has been reduced as
well as maintaining the same buffer clocks with respect to the
existing system. The optimized computation has been achieved
by reducing the number of lookup tables utilized by the
proposed algorithm. Due to the less usage, the generation
substitute bytes will be easily done to create a state matrix in
the process of cipher text which in turn reduces the overall
delay of the system. Figures 8 to 11 shows all the
implementation details.
Table I shows a comparison between existing and proposed
implementation. The proposed technique has a better
performance in terms of area, delay and clock frequency.
Figure 8: Implementation of Algorithm for 128 bit key
Figure 9: Synthesis report
Figure 10: Schematic Diagram
Figure 11: RTL Schematic
Parameter Existing Proposed
Registers 7910 2182
Slice LUT’s 6221 3778
LUT-FF’s 2981 1185
18 391
BUFG 1 1
Delay 3.513ns 2.615ns
Frequency 284.682MHz 100.857MHz
In the proposed design, a hardware description language was
utilized for encryption and decryption operations. The
minimum delay calculated from the implementation of the
code represents that the output parameters are obtained
perfectly. Hence, AES implementation on an FPGA can be
done with the average delay of 2.61ns respectively. This delay
can change from circuit to circuit. However, this delay
calculation can be observed as approximate. And also the
presented method decreases the number of slices, look up
tables and frequency. Accordingly, AES encryption is
implemented at low frequency compared to the existing
architectures. The proposed design can be utilized to all low
power applications which do not concern much about area.
1. Gurpreet Singh, S., A Study of Encryption Algorithms
(RSA, DES, 3DES and AES) for Information Security.
International Journal of Computer Applications,
2013. 67(19).
2. Xinmiao, Z. and K.K. Parhi, High-speed VLSI
architectures for the AES algorithm. IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, 2004. 12(9): p. 957-967.
3. El-Rayis, A.O., T. Arslan, and A.T. Erdogan.
Addressing Future Space Challenges using
Reconfigurable Instruction Cell Based Architectures.
in 2008 NASA/ESA Conference on Adaptive
Hardware and Systems. 2008.
4. Daemen, J. and V. Rijmen, The Design of Rijndael.
Information Security and Cryptography. springer-
Verlag Berlin Heidelberg, 2002.
5. Advanced Encryption Standard. Federal Information
Processing Standards, 2001(National Institute of
Standards and Technology).
6. Shivlal Mewada, P.S., S. S. Gautam, Classification of
Efficient Symmetric Key Cryptography Algorithms.
International Journal of Computer Science and
Information Security, 2016. 14(2): p. 105-109.
7. Kumar Sinha, S., V. Rangari, and K. KumarPandey,
An Enhanced Symmetric Key Cryptography
Algorithm to Improve Data Security. International
Journal of Computer Applications, 2013. 74(20): p.
8. Razi Hosseinkhani, S.H.H.S.J., Using Cipher Key to
Generate Dynamic S-Box in AES Cipher System.
International Journal of Computer Science and
Security (IJCSS), 2012. 6(1): p. 19 - 28
9. Munusamy, K., C. Senthilpari, and D.C.K. Kho. A
low power hardware implementation of S-Box for
Advanced Encryption Standard. in 2014 11th
International Conference on Electrical
Engineering/Electronics, Computer,
Telecommunications and Information Technology
(ECTI-CON). 2014.
10. Hodjat, A. and I. Verbauwhede, Area-throughput
trade-offs for fully pipelined 30 to 70 Gbits/s AES
processors. IEEE Transactions on Computers, 2006.
55(4): p. 366-372.
11. Chen, R.J., et al. Architecture Design of High
Efficient and Non-memory AES Crypto Core for
WPAN. in 2009 Third International Conference on
Network and System Security. 2009.
12. J.Vijaya, d.m.r., High speed Low Cost
Implementation of Advanced Encryption Standard on
FPGA. International Journal of Electronics &
Telecommunication and Instrumentation
Engineering, 2010. 2(3): p. 1-7.
13. Shaji, N. and P.L. Bonifus, Design of AES
Architecture With Area and Speed Tradeoff. Procedia
Technology, 2016. 24: p. 1135-1140.
14. Chodowiec, P. and K. Gaj, Very Compact FPGA
Implementation of the AES Algorithm, in
Cryptographic Hardware and Embedded Systems -
CHES 2003: 5th International Workshop, Cologne,
Germany, September 8–10, 2003. Proceedings, C.D.
Walter, Ç.K. Koç, and C. Paar, Editors. 2003,
Springer Berlin Heidelberg: Berlin, Heidelberg. p.
15. Van Dyken, J. and J.G. Delgado-Frias, FPGA
schemes for minimizing the power-throughput trade-
off in executing the Advanced Encryption Standard
algorithm. Journal of Systems Architecture, 2010.
56(2–3): p. 116-123.
... A hardware description language was utilized for encryption and decryption operations [10]. The minimum delay calculated from the implementation of the code represents that the output parameters are obtained perfectly. ...
___ Parallel processing and resource sharing are the most important techniques for increasing the speed and throughput in hardware implementation while reducing the number of gates by sharing the needed resources. This paper proposed a computational logic based Rijndael S-Box implementation for the SubByte transformation in the Advanced Encryption Standard (AES) algorithm. A new parallel architecture is proposed that use the resource sharing idea in ShiftRows and InvShiftRows blocks. For implementing a computational logic, a new S-Box architecture is proposed by parallel architecture and the new architecture is implemented on Field Programmable Gate Arrays (FPGAs). For MixColumns and InvMixColumns blocks, the resource sharing technique has also been used to reduce hardware consumption. The simulation results and synthesis report of place and route indicate that the area occupied by this architecture is 1425 slices (9%) with a maximum clock frequency of 250 MHz on a Xilinx xc6slx25-3fgg484 device.
Full-text available
Security threats have been a major concern as a result of emergence of technology in every aspect including internet market, computational and communication technologies. To solve this issue effective mechanism of " cryptography " is used to ensure integrity, privacy, availability, authentication, computability, identification and accuracy. Cryptology techniques like PKC and SKC are used of data recovery. In current work, we describe classification of efficient approach of symmetric cryptosystem architecture on the basis of attributes: effectiveness, scalability, flexibility, reliability and degree of security issues essential for safe wired and wireless communication. The work explores efficient private key algorithm based on security of individual system and scalability under criteria of memory–cpu utilization together with encryption performance. The investigation results in Rijndael algorithm as superior over other symmetric algorithm. The work opens a novel direction over cloud information security and internet of things. Keywords— Private key (Symmetric Cryptosystem SKC); Public Key (Asymmetric Cryptosystem PKC); wired communication; Wireless Communication; Variable key size and length (VKS/L).
Full-text available
AES, is the well-accepted cryptographic algorithm which could be utilized to ensure security of electronic information since it is proven to be resistive to most of the attacks. In this work, we present the AES-128 encryption and decryption circuit using area optimized iterative architecture. Here the technique used to obtain a lower area delay product is to map the transformations to 8 bit hardware while keeping the datapath to be 128 bit. A control unit is designed to keep track of the transformations. The proposed architecture have been implemented on the most recent Xilinx Spartan FPGA, their area and delay are compared with the previous works and it is proved that proposed technique has lower area coverage and delay.
Conference Paper
Full-text available
In this paper a compact FPGA architecture for the AES al- gorithm with 128-bit key targeted for low-cost embedded applications is presented. Encryption, decryption and key schedule are all implemented using small resources of only 222 Slices and 3 Block RAMs. This im- plementation easily ts in a low-cost Xilinx Spartan II XC2S30 FPGA. This implementation can encrypt and decrypt data streams of 150 Mbps, which satises the needs of most embedded applications, including wire- less communication. Specic features of Spartan II FPGAs enabling com- pact logic implementation are explored, and a new way of implementing MixColumns and InvMixColumns transformations using shared logic re- sources is presented.
Conference Paper
This paper presents a low power custom hardware implementation of Rijndael S-Box for Advanced Encryption Standard (AES). This custom hardware was designed by using combinational logic unlike the previous works which rely on look-up tables and memory to implement the S-Box. The minimization of power consumption is implemented by optimizing the architecture of the composite field S-Box together with using of pass transmission gate (PTG) to realize the logic functions. The circuits were designed using the DSCH3 VLSI CAD tool and the layouts were drawn by using the Microwind 3 VLSI CAD tool. The post layout netlist was then evaluated in terms of power dissipation, propagation delay, power and area by performing detailed transistor-level simulations by using LTSpice ver4.13 CAD simulator. The simulated results of these circuits were compared with other published results, where better performance was observed for power dissipation with as low as 106.2μW at 10MHz and lower propagation delay of 5ps. The simulations also showed that the presented S-Box has 20.1% reduction in power consumption as compared to the recent published paper.
From the Publisher:In October 2000, the US National Institute of Standards and Technology selected the block cipher Rijndael as the Advanced Encryption Standard (AES). AES is expected to gradually replace the present Data Encryption Standard (DES) as the most widely applied data encryption technology.|This book by the designers of the block cipher presents Rijndael from scratch. The underlying mathematics and the wide trail strategy as the basic design idea are explained in detail and the basics of differential and linear cryptanalysis are reworked. Subsequent chapters review all known attacks against the Rijndael structure and deal with implementation and optimization issues. Finally, other ciphers related to Rijndael are presented.|This volume is THE authoritative guide to the Rijndael algorithm and AES. Professionals, researchers, and students active or interested in data encryption will find it a valuable source of information and reference.
Conference Paper
This paper presents the architecture design of a high efficient and non-memory Advanced Encryption Standard (AES) crypto core to fit WPAN security requirement. The proposed basis transformation approach from Galois Field (28) to Galois Field GF(((22)2)2) can significantly reduce the hardware complexity of the SubBytes Transformation (S-box). Besides, the on-the-fly key expansion function is used to replace the RAM-based, and the new on-the-fly key scheduler fully supports AES-128, AES-192 and AES-256. Moreover, resource-sharing scheme will also be employed to reduce the hardware complexity of the cipher and decipher. Experiment results show that the AES core works at 100 MHz clock it takes about 400 ns and 770 ns to complete an AES-128 encryption and decryption, respectively. That is, the corresponding throughputs are 320 Mbps and 166 Mbps. The hardware cost of the AES design is about 16.4 K logic cells with 3-in-1 key scheduler included. Experiment results also show that the proposed design is suitable for integration into the WPAN system chips due to its acceptable power dissipation.
Today most research involving the execution of the Advanced Encryption Standard (AES) algorithm falls into three areas: ultra-high-speed encryption, very low power consumption, and algorithmic integrity. This study’s focus is on how to lower the power consumption of an FPGA-based encryption scheme with minimum effect on performance. Three novel FPGA schemes are introduced and evaluated. These schemes are compared in terms of architectural and performance differences, as well as the power consumption rates. The results show that the proposed schemes are able to reduce the logic and signal power by 60% and 27%, respectively on a Virtex 2 Pro FPGA while maintaining a high level of throughput.