Concurrent Error Detection in MultiplexerBased Multipliers for Normal Basis of GF(2m) Using Double Parity Prediction Scheme
ABSTRACT Successful implementation of elliptic curve cryptographic systems primarily depends on the efficient and reliable arithmetic
circuits for finite fields with very large orders. Thus, the robust encryption/decryption algorithms are elegantly needed.
Multiplication would be the most important finite field arithmetic operation. It is much more complex compared to the finite
field addition. It is also frequently used in performing point operations in elliptic curve groups. The hardware implementation
of a multiplication operation may require millions of logic gates and may thus lead to erroneous outputs. To obtain reliable
cryptographic applications, a novel concurrent error detection (CED) architecture to detect erroneous outputs in multiplexerbased
normal basis (NB) multiplier over GF(2
m
) using the parity prediction scheme is proposed in this article. Although various NB multipliers, depending on aa2i = åj = 0m  1 ti,j a2j \alpha \alpha^{{2^i }} = \sum\limits_{j = 0}^{m  1} {t_{i,j} } \alpha^{{2^j }} , have different time and space complexities, NB multipliers will have the same structure if they use a parity prediction
function. By using the structure of the proposed CED NB multiplier, a CED scalable multiplier over composite fields with 100%
error detection rate is also presented.
 Citations (23)
 Cited In (0)
 [Show abstract] [Hide abstract]
ABSTRACT: The finite field is widely used in errorcorrecting codes and cryptography. Among its important arithmetic operations, multiplication is identified as the most important and complicated. Therefore, a multiplier with concurrent error detection ability is elegantly needed. In this paper, a concurrent error detection scheme is presented for bitparallel systolic dual basis multiplier over GF(2m) according to the Fenn’s multiplier in [7]. Although, the proposed method increases the space complexity overhead about 27% and the latency overhead about one extra clock cycle as compared to Fenn’s multiplier. Our analysis shows that all single stuckat faults can be detected concurrently.Journal of Electronic Testing 01/2005; 21(5):539549. · 0.45 Impact Factor  SourceAvailable from: ChiouYng Lee[Show abstract] [Hide abstract]
ABSTRACT: Because faultbased attacks on cryptosystems have been proven effective, fault diagnosis and tolerance in cryptography have started a new surge of research and development activity in the field of applied cryptography. Without magnitude comparisons, the Montgomery multiplication algorithm is very attractive and popular for Elliptic Curve Cryptosystems. This paper will design a Montgomery multiplier array with a bitparallel architecture in GF (2 m ) with concurrent error detection capability to protect it against faultbased attacks. The robust Montgomery multiplier array with concurrent error detection requires only about 0.2% extra space overhead (if m = 512 is as an example) and requires four extra clock cycles compared to the original Montgomery multiplier array without concurrent error detection.IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences 02/2006; E89A(2):566–574. · 0.24 Impact Factor  SourceAvailable from: research.sabanciuniv.edu
Article: Finite fields and their applications
[Show abstract] [Hide abstract]
ABSTRACT: Since the Fqlinear spaces F m q and Fq m are isomorphic, an mfold multisequence S over the finite field Fq with a given characteristic polynomial f ∈ Fq(x), can be identified with a single sequence S over Fqm with characteristic polynomial f . The linear complexity of S, which will be called the generalized joint linear complexity of S , can be significantly smaller than the conventional joint linear complexity of S. We determine the expected value and the variance of the generalized joint linear complexity of a random m fold multisequence S with given minimal polynomial. The result on the expected value generalizes a previous result on periodic m fold multisequences. Moreover we determine the expected drop of linear complexity of a random mfold multisequence with given characteristic polynomial f , when one switches from conventionalHandbook of Algebra 01/1996; 1:321363.
Page 1
Concurrent Error Detection in MultiplexerBased
Multipliers for Normal Basis of GF(2m) Using Double Parity
Prediction Scheme
ChiouYng Lee & Che Wun Chiou & JimMin Lin
Received: 11 May 2008 /Revised: 14 March 2009 /Accepted: 16 March 2009 /Published online: 21 April 2009
# 2009 Springer Science + Business Media, LLC. Manufactured in The United States
Abstract Successful implementation of elliptic curve cryp
tographic systems primarily depends on the efficient and
reliable arithmetic circuits for finite fields with very large
orders. Thus, the robust encryption/decryption algorithms
are elegantly needed. Multiplication would be the most
important finite field arithmetic operation. It is much more
complex compared to the finite field addition. It is also
frequently used in performing point operations in elliptic
curve groups. The hardware implementation of a multipli
cation operation may require millions of logic gates and
may thus lead to erroneous outputs. To obtain reliable
cryptographic applications, a novel concurrent error detec
tion (CED) architecture to detect erroneous outputs in
multiplexerbased normal basis (NB) multiplier over GF
(2m) using the parity prediction scheme is proposed in
this article. Although various NB multipliers, depending
on aa2i¼P
plexities, NB multipliers will have the same structure if
m?1
j¼0
ti;ja2j, have different time and space com
they use a parity prediction function. By using the structure
of the proposed CED NB multiplier, a CED scalable
multiplier over composite fields with 100% error detection
rate is also presented.
Keywords Finitefields.Cryptography.Faultdetection.
Doubleparityprediction.Sidechannelattack.Normalbasis
1 Introduction
Finite fields over GF(2m) are of great interest for their
applications in elliptic curve cryptography (ECC) and error
control coding. In recent years, it has received much
attention due to the emergence of ECC as a potential
candidate for realizing robust cryptosystems in resource
constrained environments and lightweight applications [1].
Along with rapid advancement in verylargescale integra
tion (VLSI) technology, the growing popularity of smart
cards and secure communication through portable devices
is highly desirable to design dedicated circuits for integrat
ing ECC for highvolume applications. Finite field arith
metic is frequently encountered in ECC to perform the
basic operations like point additions and point doubling
operations in elliptic curve groups. Since no carry propa
gation occurs in GF(2m), the addition of two single bits
requires only a logical XOR operation. Division operations
on the other hand can be performed either by a lookup
table arrangement or through a series of multiplications.
Therefore, designing an efficient multiplier with CED
capability is highly desirable in order to improve the
reliability of dedicated cryptographic hardware.
For most of the cryptographic applications [2, 3], it is
quite common that the field size can range from 160 to 2048
bits. The hardware implementation of such multiplier with
large order requires more than a million of transistors; and it
J Sign Process Syst (2010) 58:233–246
DOI 10.1007/s1126500903614
C.Y. Lee (*)
Department of Computer Information and Network Engineering,
Lunghwa University of Science and Technology,
Taoyuan County 333,
Taiwan, Republic of China
email: PP010@mail.lhu.edu.tw
C. W. Chiou
Department of Computer Science and Information Engineering,
Ching Yun University,
ChungLi 320,
Taiwan, Republic of China
email: cwchiou@cyu.edu.tw
J.M. Lin
Department of Information Engineering and Computer Science,
Feng Chia University,
Taichung City 407,
Taiwan, Republic of China
email: jimmy@fcu.edu.tw
Page 2
is likely that one or more faulty transistors could lead to
incorrect output in the computation of field multiplication.
Moreover, reliable finite field multiplication architecture
plays a critical role in today’s VLSI designs. Several research
works have, therefore, been addressed about taking care of
concurrent error detection (CED) for digital electronic
circuits during the last few decades. The design of efficient
multipliers with CED capability is highly desirable to have a
reliable and dedicated cryptographic hardware. It has been
generally accepted that the parity prediction technique is the
most economical approach to concurrent detection for
permanent faults [4, 5]. Fault detection architectures could
either be derived by the use of parity prediction method or
by using a time redundant method to detect the errors in the
results. The traditional time redundancy approach employs
the same hardware circuit to repeatedly carry out an
operation. Since the same hardware circuit is used, repeated
operations will also produce the same erroneous results in
the presence of faults. The erroneous results therefore cannot
be successfully identified by this approach. To overcome
such problem, repeated operations using shifted operands are
then proposed in [6]. Another commonly used approach is
based on the augmentation of finite field multipliers over GF
(2m) with a parity prediction scheme [7–9]. Fenn et al. [7]
have proposed an online fault detection method for a bit
serial multiplier in GF(2m). And, ReyhaniMasoleh and
Hasan [8] have also provided a generic parity prediction
scheme for both bitparallel and bitserial polynomial basis
multipliers. Lee et al. [9] have then proposed a bitparallel
systolic dual basis multiplier with CED capability.
All aforementioned techniques, however, do not make
use of the feature of rich mathematical structures which
constitute the foundation of many cryptographic schemes,
especially in the normal basis representation of GF(2m).
The major advantage of the normal basis (NB) representa
tion is that the squaring of an element can be simply
performed by cyclicshifting its binary form. Normal basis
multiplication was firstly invented by Massey and Omura
[10]. Various efficient bitparallel and bitserial architec
tures for normal basis multiplication over GF(2m) have been
developed in [11–13, 23]. For this reason, this article is
aimed to propose a multiplexerbased normal basis multi
plication over GF(2m). Moreover, we also introduce a
“double parity bits” scheme in the proposed normal basis
multiplier. Applying our proposed CED multiplier archi
tecture, the proposed CED scalable multiplier over com
posite fields can obtain about 100% the error detection rate.
2 Preliminaries
In this section, we briefly review the normal basis
multiplication algorithm and the parity prediction scheme.
2.1 Traditional Normal Basis Multiplication Over GF(2m)
It is well known that the finite field GF(2m) can be viewed
as a vector space of dimension m over GF(2). Supposing α
is a normal element of GF(2m). Then any element A in the
Galois field GF(2m) can be represented as A ¼ a0a þ
a1a2þ ??? þam?1a2m?1¼ a0;a1;???;am?1
ordinates ai∈GF(2) for 0≤i≤m1 and N ¼ a;a2;???;
a2m?1g is the normal basis (NB) of GF(2m). In hardware
implementation, the squaring of A can be easily performed
by the right cyclic shifts, i.e., A2i¼ ai;aiþ1;???;am?1;a0;
a1;:::;ai?1Þ.
ðÞ, where the co
?
ð
Definition 1 [14] Let p=mt+1 represents a prime number
and gcd(mt/k,m)=1, where k denotes the multiplicative
order of 2 module p. Let γ be a primitive root p of unity. By
employing a ¼ g þ g2mþ ??? þ g2m t?1
normal basis N for GF(2m) over GF(2). This basis is called
the Gaussian normal basis (GNB).
ðÞ
,we can generate a
Remark 1 Let a ¼ g þ g2mþ ??? þ g2m t?1
generate a normal basis of GF(2m), the value m is then an
even number.
ðÞwith t odd be to
Significantly, GNBs exist for GF(2m) whenever m is not
divisible by 8. Definition 1 characterizes exactly which
finite fields have normal bases generated by general Gauss
periods. For more information on how to perform fast
arithmetic under normal bases generated by Gauss periods,
readers please refer to [14].
By adopting Definition 1, each element A of GF(2m) can
also be given as
?
þ a1 g2þ g2mþ1þ ??? þ g2m t?1
?
From γp=1, the set N can also be translated into the
redundant basis R={γ, γ2,…, γp1}. Therefore, the normal
basis element A ¼ a0;a1;???;am?1
redundant basis representation can be represented as
A ¼ a0 g þ g2mþ ??? þ g2m t?1
?
þ am?1 g2m?1þ g22m?1þ ??? þ g2mt?1
ðÞ
?
ð Þþ1
?
þ ???
?
ð1Þ
ðÞ applied with the
A ¼ aF 1 ð Þg þ aF 2 ð Þg2þ ... þ aF p?1
where
?
Forexample,letm=5 and t=2, we use a ¼ g þ g< 2m>p¼
g þ g10to generate the normal basis
Ten we obtain that {γ, γ2, γ3, γ4, γ5, γ6, γ7, γ8, γ9, γ10}. In
this way, A ¼ a0;a1;???am?1
by the redundant representation A ¼ a0g þ a1g2þ a2g4þ
ðÞgp?1
ð2Þ
F 2i2mjmod p
?¼ i;0 ? i ? m ? 1;0 ? j ? t ? 1
a;a2;a22;a23;a24
no
.
ðÞ can also be represented
234 J Sign Process Syst (2010) 58:233–246
Page 3
a3g3þ a4g5þ a4g6þ a2g7þ a3g8þ a1g9þ a0g10. Observ
ing this representation, the coefficients of A are duplicated
by tterm coefficients of the original normal basis element
A ¼ a0;a1;???am?1
normal basis of GF(2m). Thus, by using the function F(2i2mj
mod p)=i, the field element A ¼ aF 1 ð Þ;aF 2 ð Þ;???;aF p?1
with the redundant representation can be transformed into
the following representation, A ¼ ð a0;???;a0
a2;???;a2;??????am?1;???;am?1Þ. Therefore, the redundant
basis could be easily converted into the normal basis element.
Let A ¼ ða0;a1;???;am?1Þ and B ¼ b0;b1;???;bm?1
indicate two normal basis elements in GF(2m), and A;B 2
GF 2m
ð
of C can then be represented by
ðÞ if the field element A presents a typet
?
fflfflfflfflfflffl{zfflfflfflfflfflffl}
ðÞ
?
t
;a1;???a1;
ðÞ
Þ represent their product, i.e., C=AB. Coordinate ci
ci¼ Q A?i
ðÞ;B?i
ðÞ
??
¼
X
p?2
j¼1
aF j?1þi
ðÞbF p?jþi
ðÞ
ð3Þ
where “A(i)” denotes left circular shifts of the element A by
i positions. Therefore, the product of A and B can be
presented in Algorithm 1. This algorithm for multiplication
in GF(2m) uses the GNB of typet as described in IEEE
Standard 13632000 [2].
Algorithm 1 (Conventional normal basis multiplication)
Input: A;B 2 GF 2m
Output: C=AB
ðÞ and Q A;B
ðÞ in (3)
Step 1. C←0
Step 2. For i=0 to m1 do
2.1. ci QðA;BÞ
2.2. A←A(1), B←B(1)
Step 3. return C
The number of product terms aibjin Q(A,B), denoted by
CN, is known as the complexity of normal basis. It is well
known that CN≥2m1 If CN≥2m1, then the NB is called an
optimal NB. Assuming that all gates are 2input gates, the
implementation of cimay thus require CNAND gates and
CN1 XOR gates. Hence, type1 and type2 GNBs will have
minimal space complexity. But, when t>2, there does exist
a problem that the space complexity largely increases.
2.2 Concurrent Error Detection Scheme
Arithmetic operations combined with parity prediction
scheme can help to detect errors in the results. In the basic
form, it includes m data bits and one extra parity bit. The
parity bit holds the following definition:
Definition 2 [8] Let the field element A in GF(2m) be
represented by A ¼ a0a þ a1a2þ ??? þ am?1a2m?1. The
parity PA of the field element A is defined as the bit
PA¼ a0þ a1þ ??? þ am?1mod 2.
Lemma 1 [8] Let g∈GF(2) and A be an element of GF(2m).
The parity bit of field element gA is given by PgA=g∙PA.
The multiplier with concurrent error detection (CED)
circuit using parity prediction scheme is shown in Fig. 1.
This structure includes testing arithmetic circuit, parity
generator and equality checker. The parity generator is
based on two signals A and B to perform the predicted
parity bitbPCof C. The equality checker compares signals
cm1. The error signal b eC¼ 1 indicates the existence of
PC andbPC, i.e., b eC¼bPCþ PC, where PC=c0+c1+…+
stuckat fault occurred in the result C.
3 Proposed Multiplexerbased Normal Basis Multiplier
Over GF(2m)
To facilitate the representation of the normal basis
multiplication, we will give the following definitions.
Definition 3 Let A=(a0,a1,…,am1) be an NB element in GF
(2m). It can therefore be defined by the following formula:
Ai¼ aia þ aiþ1a2þ ??? þ am?ia2m?1?i
Thus, the field element A can be represented by
ð4Þ
A¼ a0a þ a1a2þ ??? þ am?1a2m?1
¼ a0a þ a1a þ ??? þ am?1a2m?2
¼ a0a þ A2
Lemma 2 Let Ai¼ aia þ A2
product can be represented by AiBi¼ Zia þ Aiþ1Biþ1
where Zi¼ aibia þ biA2
??2
1
ð5Þ
iþ1and Bi¼ bia þ B2
iþ1, their
ðÞ2,
iþ1þ aiB2
iþ1.
Multiplier
circuit
Parity generator
AB
C
e ˆ
Equality checker
C
C
Pˆ
Figure 1 Multiplier architecture with concurrent error detection
capability.
J Sign Process Syst (2010) 58:233–246235
Page 4
Proof Given Ai¼ aia þ A2
AiBiis generalized to obtain that
?
¼ a aibia þ aiB2
¼ aZiþ A2
iþ1and Bi¼ bia þ B2
iþ1in (2),
AiBi¼ aia þ A2
¼ aibia2þ aiaB2
?
iþ1
?
bia þ B2
iþ1þ biaA2
iþ1þ biA2
iþ1
iþ1
??
iþ1þ A2
iþ1
iþ1B2
iþ1B2
iþ1
?þ A2
iþ1
iþ1B2
■
On the basic concept of Lemma 2, the product C=AB
can be obtained by
C ¼ AB
¼ a0a þ A2
¼ aZ0þ A1B1
¼ aZ0þ aZ1
¼ ???
¼ aZ0þ aZ1
1
??
b0a þ B2
Þ2
Þ2þ A2B2
1
??
Þ22
ð
ðð
ðÞ2þ??? þ aZm?1
ðÞ2m?1
ð6Þ
Applying the modified Booth’s recoding scheme,
aiB2
using any of the alternatives in Table 1. It clearly shows
that aiB2
ai=bi =1. Hence, the value of aiB2
selected from either of the items: 0, B2
B2
Therefore, we can use only (mi) 4×1 multiplexers and
one AND gate to determine the value of Ziwhen the value
of B2
iþ1þ biA2
iþ1in the function Zi can be performed by
iþ1þ biA2
iþ1requires actual computation only when
iþ1þ biA2
iþ1can be
iþ1, A2
iþ1and
iþ1þ A2
iþ1, depending on the values of ai and bi.
iþ1þ A2
iþ1is precomputed, as shown in Fig. 2.
Figure 2 The detailed circuit of
Zi module.
Notably, Zm1equals to am1bm1α since Am=Bm=0. Thus,
Zm1could be realized with only one AND gate.
In (6), αZiis the critical operation of a normal basis mul
tiplication. From Section 2, recall that a ¼ g þ g2mþ ???þ
g2m t?1
element A can be represented by the redundant basis. Thus,
the computation of αA could be simplified:
?
¼ Ag þ Ag2mþ ??? þ Ag2m t?1
¼ A1 ð Þþ A2mmod p
¼ aF 0 ð Þþ aF 1 ð Þg þ ??? þ aF p?1
where
ðÞwill generate a normal basis of GF(2m). And the field
aA¼ g þ g2mþ ??? þ g2m t?1
ðÞ
?
A
ðÞ
ðÞþ ??? þ A2m t?1
ðÞmod p
ðÞ
ðÞgp?1
ð7Þ
Ai ð Þ¼ Agi¼
X
p?1
j¼0
aF j?i
ðÞgj
aF i ð Þ¼
X
t?1
k¼0
aF i?2mk
ðÞ
Therefore, from the redundant basis to the NB, the αA in
(7) can be represented by
aA ¼
X
m?1
i¼0
aF 2i
ð Þþ aF 0 ð Þ
??a2i
ð8Þ
Remark 2 Let an optimal normal basis (ONB) be existed in
F(2m). Then, αAirequires (m1i) XOR gates.
Example 1 Let A=(a0,a1,a2,a3,a4) be a NB element of GF
(25), and let α=γ+γ10be used to generate the NB.
Applying the redundant representation, the field element A
can be represented by
A ¼ a0g þ a1g2þ a3g3þ a2g4þ a4g5þ a4g6þ a2g7
þ a3g8þ a1g9þ a0g10:
Table 1 The Zifunction decision table.
ai
bi
aiB2
iþ1þ biA2
iþ1
0
1
0
1
0
0
1
1
0
B2
A2
B2
iþ1
iþ1
iþ1þ A2
iþ1
bi
MUX4 1
a1 b1 a1+b1 0
MUX4 1
ai+jbi+j ai+j+bi+j
0
MUX4 1
am1bm1am1+bm1 0
ai
zi,0
zi,1
zi,j
zi,m1i
Zi
236J Sign Process Syst (2010) 58:233–246
Page 5
Then, αA is obtained by
aA ¼ ðg þ g10ÞA ¼ Að1Þþ Að10Þ
where
A1 ð Þ¼ a0þ a0g2þ a1g3þ a3g4þ a2g5þ a4g6þ a4g7
þa2g8þ a3g9þ a1g10
A10
ðÞ¼ a0þ a1g þ a3g2þ a2g3þ a4g4þ a4g5þ a2g6
þa3g7þ a1g8þ a0g9
Therefore, from the redundant basis to the NB, we
could obtain aA ¼ a1;a0þ a3;a3þ a4;a2þ a1;a2þ a4
In hardware implementation, only four XOR gates are
needed to compute αA.
Applying (6), Fig. 3 shows the multiplexerbased normal
basis multiplier over GF(2m). The architecture includes
PS, Zi, CSi, αiand sum modules. Zimodule uses one AND
gate and (m1i) MUX4x1to calculate the function Zi¼
aibiþ aiB2
could perform the computation of αZi. Each PS module
could be obtained simply through permuting Ai1+Bi1,
Ai1and Bi1into Ai+Bi, Aiand Bi, respectively. The CSi
module is carried out by the right cyclic shifting with i
positions.
ðÞ.
iþ1þ biA2
iþ1, as shown in Fig. 2. The αimodule
4 Proposed Parity Prediction Function for Individual
Modules
In this section, a single fault model is assumed. We will
investigate the parity prediction scheme for the multiplexer
based normal basis multiplier over GF(2m). As mentioned
in the previous section, the proposed normal basis
multiplier is composed of Zi, αiand sum modules. In the
following, the proposed parity prediction functions for Zi
and αimodules will be discussed. The parity prediction
function for sum module is introduced in the next section.
4.1 Parity Prediction of Zi
From Lemma 2, the function Ziis represented by
Zi¼ aibia þ aiB2
¼ zi;0a þ zi;1a2þ ??? þ zi;m?1?ia2m?1?i
where
iþ1þ biA2
iþ1
zi;0¼ aibi;
zi;0¼ aibiþjþ biaiþj; for 1 ? j ? m?1?i:
The parity prediction of Ziis calculated by
bPZi¼ zi;0þ zi;1þ ??? þ zi;m?1?i
According to (9), a modified Zimodule, called Z'i, is
shown in Fig. 4. This modified module requires only two
extra AND gates and one extra XOR gate to generate the
predicted parity bitbPZi.
4.2 Parity Prediction of αiModule
¼ aibiþ aibPBiþ1þ bibPAiþ1:
ð9Þ
The αimodule performs the computation of αZi. In the
following, the parity prediction of αiwill be discussed by
two cases: even t and odd t.
4.2.1 Case 1: Even t
According to Remark 1, we have F(pi)=F(i). Applying the
relation of F(2i2mj)=i, it is easy to show that 2
mt
2 ? mt mod p
PS PS
PS
Z0
α0
CS0
a0
b0
A1
B1
A1+B1
Zi1
αi1
CSi1
ai1
bi1
Ai
Bi
Ai+Bi
Zm1
αm1
CSm1
am1
bm1
Am
Bm
Am+Bm
A0
B0
0 C
A0+B0
Figure 3 The proposed bit
parallel normal basis multiplier
over GF(2m).
J Sign Process Syst (2010) 58:233–246 237
Page 6
and 2
then p=mt+1=29 is prime. We have 2
29,2
fore, aFð0Þis computed by
mt
2 ? mt mod p. For instance, if we let m=7 and t=4,
mt
2 ? 214¼ 28mod
mt
2þm? ?27mod 29 and 2
mt
2þ2m? ?214mod 29. There
aF 0 ð Þ¼ aF ?1
¼ aF 202m
¼ a0þ a0þ ??? þ a0
fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
ð
?
Þþ aF ?2m
t
2
ðÞþ ??? þ aF ?2m t?1
?þ aF 202m
t
ðÞ
ðÞ
ðÞðÞ
t
2þ1
??þ ??? þ aF 202m
ðÞ
t
2þt?1
??
¼ 0
Thus, αA in (8) can be reexpressed by
aA ¼
X
m?1
i¼0
aF 2i
ð Þa2i
ð10Þ
Lemma 3 Let a ¼ g þ g2mþ ??? þ g2m ðt?1
generate the field GF(2m). And let A ¼ a0a þ a1a2þ ??? þ
am?1a2m?1be a normal basis element of GF(2m). Observing
that all coefficients aF 2i
1≤i≤m1 and (t1) terms of a0.
We use the following example to illustrate the property
of Lemma 3.
ðÞwith an even t
ð Þin (10), there are t terms of aifor
Example 2 To illustrate αA computation, we use the field
GF(27). This field exists in the type4 normal basis of GF
(27) since 4 × 7 + 1 = 29 is prime. Let A ¼P
field element of GF(27). Assume α = γ + γ12+ γ17+ γ28
generates the normal basis of GF(27). According to (10),
αA is then computed by
6
i¼0aia2ibe the
aA ¼ a4þ a4þ a1
þ a1þ a3þ a6þ a2
þ a1þ a6þ a4þ a5
þ a6þ a2þ a0þ a0
ðÞa þ a0þ a2þ a6þ a5
Þa22þ a5þ a3þ a4þ a1
Þa24þ a5þ a2þ a3þ a3
Þa26
ðÞa2
ð
ð
ð
ð
ð
Þa23
Þa25
Observing the above equation, we see that each aihas
four terms, for 1≤ i≤6, except a0has three terms.
According to Lemma 3, the parity prediction of αA is
then obtained by
bPaA¼ a0:
bPaZi¼ zi;0¼ aibi
αZi, denoted by a'i module, is shown in Fig. 4. The a'i
module could generate the predicted parity bitbPaZiwithout
4.2.2 Case 2: Odd t
ð11Þ
Generally, the parity prediction of αZican be obtained
ð12Þ
According to (12), the modified αimodule for computing
any extra hardware. (Fig. 5)
According to Remark 1, applying the relation of F(2i2mj
mod p)=i, it is easy to show that 2
2
m
22
m t?1
2
ðÞ
? mt mod p and
m
22
m t?1
2
ðÞ
þmi? ?2mimod p. For instance, if we let m=12 and
i
MUX4 1
a1 b1 a1+b1
0
MUX4 1
ai+jbi+j ai+j+bi+j
0
MUX4 1
am1bm1am1+bm1
0
ab
i
zi,0
zi,1
zi,j
z
Z
i,m1i
i
1
ˆ
iA P
ˆP
+
1
ˆ
iB P
+
Figure 4 The detailed circuit of
module.
αimodule
zi,0
zi,1
zi,j
zi,m1
ˆ
iZ
Pα
Zi
,0 'i z
,1 'i z
,'i j
z
,1
'i m−
z
Figure 5 The detailed circuit of module for an eventype normal
basis of GF(2m).
238J Sign Process Syst (2010) 58:233–246
Page 7
t=3, then p=mt+1=37 is prime. We have 2
26212¼ 26 ? 27 ¼ 36 mod 37, 2
mod37 and 2
in (8) is calculated by
m
22
m t?1
2
ðÞ
¼
m
22
m t?1
2
ðÞ
þm¼ ?2m¼ ?212
m
22
m t?1
2
ðÞ
þ2m¼ ?224mod 37. Therefore, aF 0 ð Þ
aF 0 ð Þ¼ aF ?1
¼ aF 2
¼ am
ð
?
Þþ aF ?2m
m
2 2m
ð
2þ am
fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
ðÞþ ??? þ aF ?2m t?1
?þ aF 2
2
ðÞ
ð
?þ ??? þ aF 2
Þ
Þ
t?1
2
m
2 2m
ðÞ
t?1
2þ1
?
m
2 2m
ðÞ
t?1
2
þt?1
??
2þ ??? þ am
t
¼ am
2
αA can be reexpressed by
aA ¼
X
m?1
i¼0
aFð2iÞþ am
2
??
a2i
ð13Þ
Lemma 4 Let a ¼ g þ g2mþ ??? þ g2m t?1
could be used to generate the field GF(2m). And let A ¼
a0a þ a1a2þ ??? þ am?1a2m?1be a normal basis element of
GF(2m). Observe that all coefficients aF 2i
exist t terms of aiexcept for (t1) terms of am/2for 0 ≤ i≤
m1.
ðÞwith an odd t
ð Þin (13), there
Therefore, the parity prediction of αA is then obtained by
bPaA¼ am
obtained by
2þbPA:
ð14Þ
As mentioned above, the parity prediction of αZican be
bPaZi¼ zi;m
According to (15), Fig. 6 shows the a'imodule for odd t.
The a'i module uses one XOR gate to generate the
predicted parity bitbPaZi.
5 Fault Detection Architecture in Normal Basis Multiplier
Using Double Parity Bits
2þbPZi
ð15Þ
In the previous section, two parity prediction functionsbPZi
respectively. In the following, let us consider that the
double parity bits are used to detect errors in multiplexer
based normal multiplier with CED capability.
andbPaZifor two individual modules Ziand αiare derived,
Lemma 5 [8] Let A and B be two normal basis elements in
GF(2m), and S be their multiplication (without modular
reduction). Then,bPS¼bPAbPB.
Lemma 6 Let A=(a0,a1,…,am1) and B=(b0,b1,…,bm1) be
two eventype normal basis elements in GF(2m), and C be
their multiplication. Then,bPC¼P
m?1
i¼0
aibi.
Proof In (6), the product C is performed by C ¼ aZ0þ
aZ1
ð
parity prediction of C is then performed by
Þ2þ??? þ aZm?1
ðÞ2m?1. From the concept of (9), the
bPC¼bPaZ0þ aZ1
Thus,bPCcan calculate
bPC¼
¼
i¼0
ðÞ2þ???þ aZm?1
ðÞ2m?1
¼bPaZ0þbPaZ1þ ??? þbPaZm?1
X
X
ð16Þ
m?1
i¼0
m?1
bPaZi¼
aibi
X
m?1
i¼0
zi;0
ð17Þ
Lemma 7 Let A ¼ a0;a1;???;am?1
bm?1Þ be two odd typet normal basis elements in GF(2m),
and C be their product. Then,bPC¼bPAbPBþP
Proof Applying (15), the parity prediction of C is calculat
ed by
ðÞ and B ¼ b0;b1;???;
m
2?1
ð
i¼0
zi;m
2, where
zi;m
2¼ aibm
2þiþ biam
2þi.
bPC¼
X
X
m?1
i¼0
m?1
bPaZi
zi;m
¼
i¼0
2þ
X
m?1
i¼0
bPZi
Since Aiþ1¼ aiþ1a þ aiþ2a2þ ??? þ am?1a2m?2?i
Biþ1¼ biþ1a þ biþ2a2þ ??? þ bm?1a2m?2?i, we have Zi¼
aibia þ aiB2
a2m?1?i, where zi;0¼ aibi and zi;j¼ aibiþjþ biaiþj, for
and
iþ1þ biA2
i þ 1¼ zi;0a þ zi;1a2þ ??? þ zi;m?1?i
αimodule
zi,0
zi,1
zi,m/2 zi,m1
ˆ
iZ
Pα
Zi
ˆ
i
ZP
,0 'i z
,1 'i z
2
,'
m
iz
,1
'i m−
z
Figure 6 The detailed circuit of a'imodule for an odd typet normal
basis of GF(2m).
J Sign Process Syst (2010) 58:233–246239
Page 8
1≤j≤m1i. When i>m/2, zi;m
Lemma 5,bPCcould also be reexpressed by
bPC¼bPAbPBþ
The proposed multiplier with CED capability for an even
typet normal basis of GF(2m) is shown in Fig. 7. Using
Lemmas 5 and 6, the predicted parity bits ofbPSandbPCare
respectively. In this figure, the flag eS¼ PSþbPSindicates
eC¼ PCþbPCindicates the presence of a single fault in αi
situation. As eS=1, there is a single stuckat fault in Zi
module. Similarly, there is a single stuckat fault in αiand
sum modules if eC=1.
The proposed multiplier with CED capability for an odd
typet normal basis of GF(2m) employs Lemmas 5 and 7. In
Fig. 7, the a'i module is used, and two input signals of
AND1 gate are replaced bybPA andbPB, respectively.
CED capability. As mentioned above, the proposed fault
detection architecture for alltype (even and odd typet) NB
multiplier is also shown in Fig. 7 because even and odd type
t NB multipliers have the same parity prediction architecture.
2¼ 0. Moreover, based on
X
m
2?1
i¼0
zi;m
2
ð18Þ
■
compared to the actual parity bits of PS, and PC,
the presence of a single fault in Zimodule, and the flag
and sum modules. When eS=eC=0, it indicates a faultfree
Figure 7 is then performed by odd typet NB multiplier with
6 Efficient CED Scalable Multiplier Over Composite Fields
In this section, multiplications over GF(2m) where m is a
composite number are considered. A novel CED NB
multiplier using the proposed CED NB multiplier in
Fig. 7 will be derived. With these composite fields, efficient
multipliers have been developed in [13, 15, 17]. Menezes et
al. [16] revealed that the composite fields GF(2nk) have the
following Lemma:
Lemma 8 Let gcd(n,k)=1. Let N ¼
normal basis of GF(2n) over GF(2). Then, N is also a
normal basis of GF(2nk) over GF(2k).
b;b2;:::;b2n?1
no
be a
According to Lemma 8, Oh et al. [13] demonstrated that
the NB multiplication over the composite field GF(2m), if
m=nk with gcd(n,k)=1, using type1 and type2 normal
bases can be classified into three cases: (a) a subfield GF
(2n) is type2 normal basis and an extension field GF(2nk) is
type1 normal basis; (b) a subfield GF(2n) is type1 normal
basis and an extension field GF(2nk) is type2 normal basis;
(c) a subfield GF(2n) is type2 normal basis and an
extension field GF(2nk) is type2 normal basis. For clarity,
we consider that an extension field GF(2nk) is type2 normal
basis to develop the scalable NB multiplier with CED.
Recall that a Toeplitz matrix [18] is one that is constant
along its diagonals. Therefore, all the entries in a Toeplitz
matrix are uniquely determined by the first row and column of
the matrix. Another way of saying this is that the T(i,j) entry
of a Toeplitz matrix T is given by tij. Recall that a Hankel
matrix is one that is constant along antidiagonals. Therefore a
Hankel matrix H can be determined by its first column and
last row. In other words, if H denotes a k×k Hankel matrix,
then its (i, j) entry will be defined by value in a list h of length
2k−1, namely hi+j−k+1. Note that a Hankel matrix is always
PS
PS
PS
Z’0
α’0
CS’0
0
Z’i1
α’i1
CS’i1
0
Z’m1
α’m1
CS’m1
0
a0
b0
0
0
0
1
ai1
bi1
am1
bm1
A0
B0
B1
A1
A1+B1
Bi
Ai
Ai+Bi
Bm
Am
Am+Bm
C
eC
eS
PA PB
Figure 7 The proposed NB multiplier with CED capability.
240J Sign Process Syst (2010) 58:233–246
Page 9
symmetric. By using a matrixvector computation, Hankel and
Toeplitz multiplications can be represented by
c0
c1
...
ck?1
c0
c1
2
66664
2
3
77775
3
¼
h?kþ1
h?kþ2
...
h0
t0
t?1
...
t?kþ1
???
???
..
h?1
h0
...
hk?2
tk?2
tk?3
...
t?1
h0
h1
...
.
???
???
???
..
hk?1
tk?1
tk?2
...
t0
2
66664
2
3
77775
a0
a1
...
ak?1
a0
a1
...
ak?1
2
66664
3
77775
¼ HA;
...
ck?1
66664
77775
¼
.
???
66664
3
77775
ð
2
66664
3
77775
¼ TA:
Let A ¼ c0;c1;???;ck?1
two type2 NB elements over GF(2k), and C be their product.
Then, the product C can obtain the following form [19]:
2
ðÞ and B ¼ b0;b1;???;bk?1
Þ be
c0
c1
ck?2
ck?1
2
2
6666664
3
7777775
¼
b1
b2
...
b2
b3
...
???
???
..
bk?1
bk?1
...
b2
b1
3
bk?1
bk?2
...
b1
b0
a0
a1
...
ak?2
ak?1
.
bk?1
bk?1
bk?1
bk?2
bk?3
bk?4
???
???
66666664
b0
0
...
3
77777775
a0
a1
...
ak?2
ak?1
2
66666664
3
77777775
þ
0
b0
...
???
???
..
bk?2
bk?3
...
b0
0
.
...
bk?3
bk?2
bk?4
bk?3
?A
???
???
0
b0
66666664
77777775
2
66666664
3
77777775
¼ B1þ B2
In (19), two matrixes B1and B2are formed by Hankel
and Toeplitz matrixes, respectively. By combining Lemma
8 with (19), the following Lemma is obtained.
?
ð19Þ
Lemma 9 Let A=(A0,A1,…,Ak1) and B=(B0,B1,…,Bk1) be
two elements of GF(2nk) over GF(2n), and C be their
product. Then,
2
C0
C1
Ck?2
Ck?1
2
2
6666664
3
7777775
¼
B1
B2
...
Bk?1
Bk?1
B2
B3
...
Bk?1
Bk?2
Bk?3
Bk?4
???
???
..
Bk?1
Bk?1
...
B2
B1
3
Bk?1
Bk?2
...
B1
B0
A0
A1
...
Ak?2
Ak?1
.
???
???
Bk?2
Bk?3
66666664
3
77777775
A0
A1
...
Ak?2
Ak?1
2
66666664
3
77777775
þ
0B0
0
...
Bk?4
Bk?3
?A
???
???
..
B0
...
Bk?3
Bk?2
.
......
???
???
0
b0
B0
0
66666664
77777775
2
66666664
3
77777775
¼ B'1þ B'2
?
ð20Þ
where Ai=(ai,0,ai,1,…,ai,n1) and Bi=(bi,0,bi,1,…,bi,n1) are
subfield coordinates of A and B.
From the structure of (20), assume that B1(i,j) and B2(i,j)
denote the (i,j)thentry of B'1and B'2, respectively. Thus,
the scalable multiplication over composite fields can lead to
the following algorithm.
Algorithm 2 (Scalable NB multiplication)
Input: A=(A0,A1,…,Ak1) and B=(B0,B1,…,Bk1) are two
elements of GF(2nk) over GF(2n)
Output: C=AB
1. C=(C0,C1,…,Ck1)=0
2. For i=0 to k1 {
3.For j=0 to k1 {
4.Temp=B1(i,j)+B2(i,j)
5.Ci,j=Ci1,j+Temp×Ai
6.}
7. }
In Algorithm 2, Step 4 performs the subfield addition,
and Step 5 carries out the subfield multiplication. We can
utility our proposed CED NB multiplier in Fig. 7 to realize
the concurrent error detection in the subfield multiplication
over GF(2n). Figure 8 shows the proposed scalable
multiplier with CED. The proposed architecture includes
one subfield multiplier, two registers and (n+1) XOR
gates. The subfield multiplier is constructed from Fig.7 to
detect errors in subfield multiplication over GF(2n). In the
beginning step, two registers are initially set with logical
zero values, and three parity bits PB1 i;j
ð Þ;PB2 i;j
ð Þand PAiare
Subfield NB
multiplier with
CED in Fig.7
C0
C1
Cj
Ck1
0
CP
1
CP
j
CP
1
k
C P
−
, i j
C e
Se
1( , )
B i j
2( , )
B i j
iA
1( , )
B i j
P
2( , )
B i j
P
iA P
nbit
1bit
Ci,j
, i j
C P
Figure 8 The proposed scalable NB multiplier with CED capability.
J Sign Process Syst (2010) 58:233–246 241
Page 10
predetermined. In every clock cycle, the result Ci,j is
carried out by Ci,j=Ci1,j+Temp×Aion Step 5, and is stored
in the register C. The parity bit PCi;jis performed by
PCi;jPCi;j¼ PCi?1;jþ PTemp?Aion Step 5 and is stored in the
parity register PC. Two indictors, eCi;jand eS, monitor the
subfield multiplication. After k2clock cycles, the register
C could be obtained by the multiplication of A and B. The
proposed scalable multiplier with CED demands only one
XOR gate and one subfield multiplication delays.
7 Analysis of Time and Area Complexities
In this section, we discuss the error detection probability in
our proposed architecture in Fig. 8. The time and space
overheads are also evaluated.
7.1 Error Detection Probability
Based on the single stuckat fault assumption, we use Fig. 8
to analyze the error detection probability. In Fig. 8, the sub
field multiplier is extended by Fig. 7 for computing the
subfield multiplication over GF(2n). Assume that the pro
bability of having an error due to a fault on the subfield
multiplication of Fig. 7 is p, for eci;j=1. And, if Ei,jdenotes
the probability of an undetected error in the (i,j)thsubfield
multiplication then Ei,j+1can be obtained by
Ei;jþ1¼ pOi;jþ 1 ? p
¼ p 1 ? Eij
¼ 1 ? 2p
where Oi,jis the probability of a detected error in the (i,j)th
subfield multiplication. Since Cjis subfield coordinates of
the product C in (20), we can obtain that Cj=(B1(0,j)+
B2(0,j))A0+(B1(1,j)+B2(1,j))A1+…+(B1(k1,j)+B2(k1,j))
Ak1. From Algorithm 2, the (i,j)thsubfield multiplier is
performed by Ci,j=Ci1,j+(B1(i,j)+B2(i,j))Ai. As i=k1, we
ðÞEi;j
ð
??þ 1 ? p
ÞEij
ðÞEijþ p
ð21Þ
have Cj=Ck1,j. By using this recursive computation in (21),
the probability of undetected errors in the result Cjcan be
obtained by
?
ð
¼ 1 ? 2p
¼ :::
¼ 1 ? 2p
E Cj
?¼ Ek?1;j
ð
¼ 1 ? 2p
ÞEk?2;jþ p
Þ2Ek?3;jþ 1 ? 2p
ðÞp þ p
ðÞkE?1;jþ 1 ? 2p
ðÞk?1p þ ::: þ 1 ? 2p
ðÞp þ p:
ð22Þ
Notably, E1,j=1 since the input data are corrected. Then,
we have
?
¼ 1 ? 2p
1 ? 2p
1 ? 2p
2
E Cj
?¼ 1 ? 2p
ð
ðÞkþ 1 ? 2p
Þkþp
ð
Þkþ1
ðÞk?1p þ ::: þ 1 ? 2p
1 ? 2p
Þ ? 1
ðÞp þ p
ðÞk?1
¼
ð
ð23Þ
In fact, the estimated E(Cj) in (23) includes the zero
inputs with the probability (1p)k. Therefore, E(Cj) should
be modified with the following formula
E Cj
??¼
1 ? 2p
ðÞkþ1
2
? 1 ? p
ðÞk
ð24Þ
Since Cjin the structure of Fig. 8 are all independable,
the probability of detected error in the NB multiplication is
then given by
PrDC
ð Þ ¼ 1 ? E Cj
?
1 ? 2p
???n
2
¼ 1 ?
ðÞkþ1
? 1 ? p
ðÞk
!n
ð25Þ
The error detection probabilities using (24) to estimate
CED NB multiplier over composite fields are shown in
Fig. 8. We consider the field GF(230) with k=5 and n=6 to
MultipliersArchitecture #AND#XOR#MUX4×1
Time delay
SKM [21]
MOM [10]
RMHM [11]
Bitparallel
Bitparallel
Bitparallel
m2
m2
m2
1.5(m2m)
2m(m1)
3m m?1
2
ð3m2?mþ6Þ
4



m2?m
2
TAþ 1 þ log2m
TAþ 1 þ log22m ? 1
TAþ 1 þ log2m
TMUXþ 2 þ log2m
d
d
e
ð
ð
d
ÞTX
ðÞ
e
ÞTX
ðÞ
eTX
Fig. 3Bitparallelm
de
ðÞTX
Table 2 Comparison of vari
ous type2 NB multipliers over
GF(2m).
Multipliers #AND #XOR#MUX4×1
Time delay
KSM [22]
MOM [10]
RMHM [11]
Fig. 3
m2
m2
m2
m
1.5(m2m)
2m(m1)
m21
m2þ3m?2
2



m2?m
2
TAþ 1 þ log2m
TAþ 1 þ log22m ? 1
TAþ 1 þ log2m
TMUXþ 2 þ log2m
d
d
e
ð
ð
d
ÞTX
ðÞ
e
ÞTX
eTX
de
ðÞTX
Table 3 Comparison of vari
ous type1 NB multipliers over
GF(2m).
242J Sign Process Syst (2010) 58:233–246
Page 11
Table 4 Comparison of various digitserial multipliers for type2 NB of GF(2m) with m = nk.
MultipliersArchitecture #AND#XOR#MUX4×1
#Latch Time delay
GSM [17]
RMHM [15]
Fig. 8
Digitserial
Digitserial
scalable
nm
nð2m?nþ1Þ
2
n
n(2m2)
n(3mn2)
3n2þ7nþ10
4


n2?n
2
3m
3m
m+k
kðTAþ ð3 þ log2mÞTXÞ
kðTAþ ð1 þ log2m
k2ðTMUXþ ð3 þ log2n
deÞTXÞ
d eÞTXÞ
Table 5 Comparison of various CED multipliers using single parity prediction scheme.
MultipliersRMHM [8] in Fig.4Lee et al. [9] Fig.7
Structure
Basis
Space overhead
Bitparallel
Polynomial
#XOR: 4m2
#AND: m
Bitparallel systolic
Dual
#XOR:m2+4m
#AND:2m2+m
#Latch:2m2+7m
TA+2TX
Bitparallel
Normal
Eventype NB
#XOR:5m4
#AND:2m2
log2m þ 1
Oddtype NB
#XOR:4.5m1
#AND:3m
Time overhead in every clock
log2m þ 1
ðÞdeTX
ðÞdeTX
Multipliers BSHM [5] with p parity bitsFig.8
structure
Basis
latency
Space overhead
Bitserial
Polynomial
m
#XOR:m+4p
#AND:2m+3p
#Latch:2p
Scalable
Normal
k2
Eventype NB
#XOR:5n2
#AND:2n2
#Latch:k
1 þ log2n
Oddtype NB
#XOR:4.5n+1
#AND:3n
#Latch:k
Time overhead
TAþ 1 þ log2q
d e þ log2p
de
ðÞTX
de
ðÞTX
Table 6 Comparison of existing
CED bitserial multiplier using
multiple parity prediction
schemes.
Note: q ¼
m
p
l m
J Sign Process Syst (2010) 58:233–246 243
Page 12
analyze the error detection probability in our proposed
scalable NB multiplier. When p=0.5, the error detection
probability is about 99.9%. As p=0.01, our proposed CED
scalable multiplier has 100% error detection probability. It
is to note that, when the designed multiplication architec
ture is incurred by an injection fault, the results from the
actual simulation may have 50% probability with the even
number of errors. Thus, the error detection capability of
traditional bitparallel PB multipliers [8] using single parity
prediction scheme is then about 50%. With the multiple
parity prediction schemes, CED multipliers are proposed by
BayatSarmadi and Hasan [5], it showed that, by using
8 parity bits, the error detection probability could be
increased to 0.996. Therefore, through analyzing the error
detection probability, our proposed scalable multiplier is
equivalent to CED multiplier architecture [5] with 100%
error detection probability.
7.2 Time and Area Overheads
We analyzed two proposed multipliers in terms of time and
area complexities, and compared them with other bit
parallel multipliers. The first architecture in Fig. 3 includes
PS, Zi, CSi, αiand sum modules. PS and CSiare without
extra hardware costs. Each Zimodule requires one AND
gate and (m1i) MUX4x1to calculate the function Zi(seen
in Lemma 2). Basically, the structure of every αimodule is
dependent on aa2i¼P
space complexity. Every αimodule for type2 normal basis
in Fig. 3 is responsible for calculating αZi, which demands
(m12i) XOR gates. For the type1 normal basis, every αi
module for calculating αZirequires no extra hardware cost.
The time delay of the proposed multiplexerbased type2
NB multiplier in Fig. 3 is then required by TMþ
2 þ log2m
cuits, the transistor count [20] was based on standard
CMOS VLSI, in which a 2input XOR gate, a 2input AND
gate, a 4×1 MUX, and 1bit latch require 6, 6, 16, and
8 transistors, respectively. Tables 2 and 3 are revealed that
our multipliers compare with traditional type2 and type1
NB multipliers [10, 11, 21, 22]. For more precise
comparison, the proposed type2 NB multiplier over GF
(2233) saves about 16.6% transistor count as compared to
existing type2 NB multipliers[10, 11, 21]; and the type1
NB multiplier over GF(2268) saves about 8.3% transistor
count as compared to existing type1 NB multipliers [10,
11, 22]. Moreover, we also make a comparison between the
proposed scalable multiplier in Fig. 8 and the existing digit
serial NB multipliers [15, 17], as shown in Table 4. When
these composite fields GF(2m) with m=nk are applied to
elliptic curve cryptosystems, Galbraith and Smart [24]
described that the chosen m value must be large enough
m?1
j¼0
ti;ja2j
and may have different
de
ðÞTX time delay. Considering some real cir
to resist the attacks. Consider such cryptographic applica
tions using finite field multiplication, we find that, in the
field GF(2690) with n=30 and k=23, our proposed scalable
multiplier can save about 22.3% time×area complexity as
compared to traditional digitserial multipliers [15, 17].
Recently, applying the single parity prediction scheme,
ReyhaniMasoleh and Hasan [8] provided bitparallel and bit
serial polynomial basis multipliers with CED capability. A
dual basis multiplier with the CED capability is also presented
by Lee et al. [9]. The major drawback of CED multipliers is
that the parity generator is compatible the original multiplier
architectures. Comparing our proposed architecture with
polynomial basis and dual basis multipliers with CED, Table 5
shows the complexity overheads of the fault detection
architectures in terms of time and area. It is revealed that
the proposed CED NB multiplier is unlike RMH multiplier
[8]. Consider the time overhead, our proposed bitparallel
structure is equivalent with RMH multiplier. Although dual
basis multiplier with CED capability [9] has lower time
overhead, but its extra space overhead is higher than that of
Fig. 7. In the composite fields, we use the structure of Fig. 7
to establish a new scalable multiplier with CED capability.
Table 6 reveals that the time and area overheads are lower
than existing CED bitserial multiplier [5].
8 Conclusions
In this paper, a multiplexerbased method is utilized for
implementing bitparallel and scalable NB multipliers
over GF(2m). The parity prediction scheme is employed to
derive efficient NB multipliers with CED capability.
Although traditional NB multipliers aa2i¼P
equation aa2i¼P
parity prediction function. Our CED architecture differs
from existing CED multiplier architectures [5, 8] that are
compatible with the common multiplier architecture. More
over, using the developed multiplexerbased NB multiplier
with CED, a new CED scalable multiplier over composite
fields is also presented. Analytical results reveal that, in the
CED scalable multiplier, the error detection probability is
about 100%. And, we estimated that this circuit in the case
of GF(2690) saves about 22.3% time×area complexity as
compared with existing digitserial multipliers.
m?1
j¼0
ti;ja2jhave
their own time and space complexities according to the
m?1
ti;ja2j, we have demonstrated that NB
multipliers could remain the same structure while applying
j¼0
References
1. Online Available: http://www.csrc.nist.gov/publications.
2. IEEE Standard 13632000, "IEEE Standard Specifications for
PublicKey Cryptography," Jan. 2000.
244J Sign Process Syst (2010) 58:233–246
Page 13
3. Nat'l Inst. of Standards and Technology, Digital Signature
Standard, FIPS Publication 1862, Jan. 2000.
4. Huang, K. H., & Abraham, J. A. (1984). Algorithmbased fault
tolerance for matrix operations. IEEE Transactions on Computers,
33(6), 518–522. doi:10.1109/TC.1984.1676475.
5. BayatSarmadi, S., & Hasan, M. A. (2007). On concurrent
detection of errors in polynomial basis multiplication. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems,
15(4), 413–426. doi:10.1109/TVLSI.2007.893659.
6. Chiou, C. W., Lee, C. Y., Deng, A. W., & Lin, J. M. (2006).
Concurrent error detection in Montgomery multiplier over GF
(2m). IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, E89A(2), 566–574.
doi:10.1093/ietfec/e89a.2.566.
7. Fenn, S., Gossel, M., Benaissa, M., & Taylor, D. (1998). Online
error detection for bitserial multipliers in GF(2m). Journal of
Electronic TestingTheory and Applications, 13, 29–40.
doi:10.1023/A:1008333132366.
8. ReyhaniMasoleh, A., & Hasan, M. A. (2006). Fault detection
architectures for field multiplication using polynomial bases.
IEEE Transactions on Computers, 55(9), 1089–1103. doi:10.
1109/TC.2006.147.
9. Lee, C. Y., Chiou, C. W., & Lin, J. M. (2005). Concurrent error
detection in a bitparallel systolic multiplier for dual basis of GF
(2m). Journal of Electronic TestingTheory and Applications, 21
(5), 539–549. doi:10.1007/s108360051053z.
10. Massey, J. L., & Omura, J. K. (1986). Computational Method and
Apparatus for Finite Field Arithmetic,. Patent: U.S. 4.587.627,
May 1986.
11. ReyhaniMasoleh, A., & Hasan, M. A. (2002). A new construction of
MasseyOmura parallel multiplier over GF(2m). IEEE Transactions
on Computers, 51(5), 511–520. doi:10.1109/TC.2002.1004590.
12. Lu, C.C. (1997). A search of minimal key functions for normal
basis multipliers. IEEE Transactions on Computers, 46(5), 588–
592. doi:10.1109/12.589230.
13. Oh, S., Lim, C. H., & Cheon, D. H. (2000). Efficient normal basis
multipliers in composite fields. IEEE Transactions on Computers,
49(10), 1133–1138. doi:10.1109/12.888054.
14. Feisel, S., von zur Gathen, J., & Shokrollahi, M. (1999). Normal
bases via general Gauss periods. Mathematics and Computations,
68, 271–290. doi:10.1090/S0025571899009886.
15. ReyhaniMasoleh, A., & Hasan, M. A. (2005). Low complexity
wordlevel sequential normal basis multipliers. IEEE Transactions
on Computers, 54(2), 98–110. doi:10.1109/TC.2005.29.
16. Menezes, A. J., Blake, I. F., Gao, X., Mullin, R. C., Vanstone,
S. A., & Yaghoobian, T. (1993). Applications of finite fields.
Kluwer international series in engineering and computer science.
ISBN: 0792392825.
17. Gao, L., Sobelman, G. E. (2000). Improved VLSI designs for
multiplication and inversion in GF(2m) over normal bases. Proc.
13th Ann. IEEE Int’l ASIC/SOC Conf. pp. 97–101.
18. Bini, D. (1995). "Toeplitz matrices, algorithms and applications,"
ERCIM News, No.22, July 1995. Available online: http://www.
ercim.org/publication/Ercim_News/enw22/teoplitz.html.
19. Lee, C. Y., & Chiou, C. W. (2005). Efficient design of low
complexity bitparallel systolic Hankel multipliers to implement
multiplication in normal and dual bases of GF(2m). IEICE,
Transactions on Fundamentals of Electronics, Communications
and Computer Sciences, E88A(11), 3169–3179. doi:10.1093/
ietfec/e88a.11.3169.
20. Pekmestzi, K. Z. (1999). Multiplexerbased array multipliers.
IEEE Transactions on Computers, 48(1), 15–23. doi:10.1109/
12.743408.
21. Sunar, B., & Koc, C. K. (2001). An efficient optimal normal basis
type II multiplier. IEEE Transactions on Computers, 50(1), 83–88.
doi:10.1109/12.902754.
22. Koc, C. K., & Sunar, B. (1998). Lowcomplexity bitparallel
canonical and normal multipliers for a class of finite fields. IEEE
Trans Comput Vol, 47(3), 353–356.
23. Hasan, M. A., Wang, M. Z., & Bhargava, V. K. (1993). A
modified MasseyOmura parallel multiplier for a class of finite
fields. IEEE Transactions on Computers, 42(10), 1278–1280.
doi:10.1109/12.257715.
24. Galbraith, S. D., & Smart, N. (1999). A cryptographic appli
cation of Weil decent. In proceedings of the seventh IMA
Conf. on cryptography and Coding, LNCS 1764, pp. 191–200.
SpringerVerlag.
ChiouYng Lee received his Bachelor's degree (1986) in medical
engineering and M.S. degree in electronic engineering (1992), both
from the Chung Yuan University, Taiwan, and his Ph.D. degree in
electrical engineering from Chang Gung University, Taiwan, in 2001.
From 1988 to the present, he has been a research associate with the
Chunghwa Telecommunication Laboratory in Taiwan. He was with
the department of project planning and taught related field courses at
ChingYun Technology University. He is currently an Assistant
Professor in the Department of Computer Information and Network
Engineering in the LungHwa University of Science and Technology.
His research interests include computations in finite fields, error
control coding, signal processing, and digital transmission systems.
He is a senior member of the IEEE and was an honor member of Phi
Tao Phi in 2001.
J Sign Process Syst (2010) 58:233–246 245
Page 14
Che Wun Chiou received his B.S. degree in Electronic Engineering
from Chung Yuan Christian University in 1982, the M.S. degree and
the Ph.D. degree in Electrical Engineering from National Cheng Kung
University in 1984 and 1989, respectively. From 1990 to 2000, he was
with the Chung Shan Institute of Science and Technology in Taiwan.
He joined the Department of Electronic Engineering, Ching Yun
University in 2000. He is currently as Professor in Computer Science
and Information Engineering at Ching Yun University. His current
research interests include faulttolerant computing, computer arithme
tic, parallel processing, and cryptography.
JimMin Lin received the B.S. degree in Engineering Science and the
M.S. and the Ph.D. degrees in Electrical Engineering, all from
National Cheng Kung University, Tainan, Taiwan, in 1985, 1987,
and 1992, respectively. Since February 1993 to July 2005, he had been
an Associate Professor at the Department of Information Engineering
and Computer Science, Feng Chia University, Taichung City, Taiwan.
Since August 2005, he is currently as a Professor at the department.
His research interests include operating systems, software integration/
reuse, embedded systems, software agent technology, and computer
arithmetic.
246J Sign Process Syst (2010) 58:233–246