Conference PaperPDF Available

Correlation Power Analysis with a Leakage Model

Authors:

Abstract and Figures

A classical model is used for the power consumption of cryptographic devices. It is based on the Hamming distance of the data handled with regard to an unknown but constant reference state. Once validated experimentally it allows an optimal attack to be derived called Correlation Power Analysis. It also explains the defects of former approaches such as Differential Power Analysis. Keywords: Correlation factor, CPA, DPA, Hamming distance, power analysis, DES, AES, secure cryptographic device, side channel.
Content may be subject to copyright.
Correlation Power Analysis with a Leakage
Model
Eric Brier, Christophe Clavier, and Francis Olivier
Gemplus Card International, France
Security Technology Department
{eric.brier, christophe.clavier, francis.olivier}@gemplus.com
Abstract. A classical model is used for the power consumption of cryp-
tographic devices. It is based on the Hamming distance of the data han-
dled with regard to an unknown but constant reference state. Once val-
idated experimentally it allows an optimal attack to be derived called
Correlation Power Analysis. It also explains the defects of former ap-
proaches such as Differential Power Analysis.
Keywords: Correlation factor, CPA, DPA, Hamming distance, power
analysis, DES, AES, secure cryptographic device, side channel.
1 Introduction
In the scope of statistical power analysis against cryptographic devices, two
historical trends can be observed. The first one is the well known differential
power analysis (DPA) introduced by Paul Kocher [12, 13] and formalized by
Thomas Messerges et al. [16]. The second one has been suggested in various
papers [8, 14, 18] and proposed to use the correlation factor b etween the power
samples and the Hamming weight of the handled data. Both approaches exhibit
some limitations due to unrealistic assumptions and model imperfections that
will be examined more thoroughly in this paper. This work follows previous
studies aiming at either improving the Hamming weight model [2], or enhancing
the DPA itself by various means [6, 4].
The proposed approach is based on the Hamming distance model which can
be seen as a generalization of the Hamming weight model. All its basic assump-
tions were already mentioned in various pap ers from year 2000 [16, 8, 6, 2]. But
they remained allusive as possible explanation of DPA defects and never leaded
to any complete and convenient exploitation. Our experimental work is a synthe-
sis of those former approaches in order to give a full insight on the data leakage.
Following [8, 14, 18] we propose to use the correlation power analysis (CPA) to
identify the parameters of the leakage model. Then we show that sound and
efficient attacks can be conducted against unprotected implementations of many
algorithms such as DES or AES. This study deliberately restricts itself to the
scope of secret key cryptography although it may be extended beyond.
This paper is organized as follows: Section 2 introduces the Hamming dis-
tance model and Section 3 proves the relevance of the correlation factor. The
model based correlation attack is described in Section 4 with the impact on the
model errors. Section 5 addresses the estimation problem and the experimental
results which validate the model are exposed in Section 6. Section 7 contains
the comparative study with DPA and addresses more specifically the so-called
“ghost peaks” problem encountered by those who have to deal with erroneous
conclusions when implementing classical DPA on the substitution boxes of the
DES first round: it is shown there how the proposed model explains many defects
of the DPA and how the correlation power analysis can help in conducting sound
attacks in optimal conditions. Our conclusion summarizes the advantages and
drawbacks of CPA versus DPA and reminds that countermeasures work against
both methods as well.
2 The Hamming Distance Consumption Model
Classically, most power analyses found in literature are based upon the Hamming
weight model [13, 16], that is the number of bits set in a data word. In a m-bit
microprocessor, binary data is coded D=Pm1
j=0 dj2j, with the bit values dj= 0
or 1. Its Hamming weight is simply the number of bits set to 1, H(D) = Pm1
j=0 dj.
Its integer values stand between 0 and m. If Dcontains mindependent and
uniformly distributed bits, the whole word has an average Hamming weight
µH=m/2 and a variance σ2
H=m/4.
It is generally assumed that the data leakage through the power side-channel
depends on the number of bits switching from one state to the other [6, 8] at a
given time. A microprocessor is modeled as a state-machine where transitions
from state to state are triggered by events such as the edges of a clock signal.
This seems relevant when looking at a logical elementary gate as implemented in
CMOS technology. The current consumed is related to the energy required to flip
the bits from one state to the next. It is composed of two main contributions: the
capacitor’s charge and the short circuit induced by the gate transition. Curiously,
this elementary behavior is commonly admitted but has never given rise to any
satisfactory model that is widely applicable. Only hardware designers are famil-
iar with simulation tools to foresee the current consumption of microelectronic
devices.
If the transition model is adopted, a basic question is posed: what is the refer-
ence state from which the bits are switched? We assume here that this reference
state is a constant machine word, R, which is unknown, but not necessarily
zero. It will always be the same if the same data manipulation always occurs at
the same time, although this assumes the absence of any desynchronizing effect.
Moreover, it is assumed that switching a bit from 0 to 1 or from 1 to 0 requires
the same amount of energy and that all the machine bits handled at a given
time are perfectly balanced and consume the same.
These restrictive assumptions are quite realistic and affordable without any
thorough knowledge of microelectronic devices. They lead to a convenient ex-
pression for the leakage model. Indeed the number of flipping bits to go from R
to Dis described by H(DR) also called the Hamming distance between D
and R. This statement encloses the Hamming weight model which assumes that
R= 0. If Dis a uniform random variable, so is DR, and H(DR) has the
same mean m/2 and variance m/4 as H(D).
We also assume a linear relationship between the current consumption and
H(DR). This can be seen as a limitation but considering a chip as a large set of
elementary electrical components, this linear model fits reality quite well. It does
not represent the entire consumption of a chip but only the data dependent part.
This does not seem unrealistic because the bus lines are usually considered as
the most consuming elements within a micro-controller. All the remaining things
in the power consumption of a chip are assigned to a term denoted bwhich is
assumed independent from the other variables: b encloses offsets, time dependent
components and noise. Therefore the basic model for the data dependency can
be written:
W=aH(DR) + b
where ais a scalar gain between the Hamming distance and Wthe power con-
sumed.
3 The Linear Correlation Factor
A linear model implies some relationships between the variances of the different
terms considered as random variables: σ2
W=a2σ2
H+σ2
b. Classical statistics in-
troduce the correlation factor ρW H between the Hamming distance and the mea-
sured power to assess the linear model fitting rate. It is the covariance between
both random variables normalized by the product of their standard deviations.
Under the uncorrelated noise assumption, this definition leads to:
ρW H =cov(W, H)
σWσH
=H
σW
=H
pa2σ2
H+σ2
b
=am
pma2+ 4σ2
b
This equation complies with the well known property: 1ρWH +1: for a
perfect model the correlation factor tends to ±1 if the variance of noise tends to
0, the sign depending on the sign of the linear gain a. If the model applies only
to lindependent bits amongst m, a partial correlation still exists:
ρW Hl/m =al
pma2+ 4σ2
b
=ρW H rl
m
4 Secret Inference Based on Correlation Power Analysis
The relationships written above show that if the model is valid the correlation
factor is maximized when the noise variance is minimum. This means that ρW H
can help to determine the reference state R. Assume, just like in DPA, that a set
of known but randomly varying data Dand a set of related power consumption
Ware available. If the 2mpossible values of Rare scanned exhaustively they
can be ranked by the correlation factor they produce when combined with the
observation W. This is not that expensive when considering an 8-bit micro-
controller, the case with many of today’s smart cards, as only 256 values are to
be tested. On 32-bit architectures this exhaustive search cannot be applied as
such. But it is still possible to work with partial correlation or to introduce prior
knowledge.
Let Rbe the true reference and H=H(DR) the right prediction on the
Hamming distance. Let Rrepresent a candidate value and Hthe related model
H=H(DR). Assume a value of Rthat has kbits that differ from those
of R, then: H(RR) = k. Since bis independent from other variables, the
correlation test leads to (see [5]):
ρW H=cov(aH +b, H )
σWσ
H
=a
σW
cov(H, H)
σ
H
=ρW H ρHH =ρW H
m2k
m
This formula shows how the correlation factor is capable of rejecting wrong
candidates for R. For instance, if a single bit is wrong amongst an 8-bit word,
the correlation is reduced by 1/4. If all the bits are wrong, i-e R=¬R, then an
anti-correlation should be observed with ρW H=ρW H . In absolute value or if
the linear gain is assumed positive (a > 0), there cannot be any Rleading to a
higher correlation rate than R. This proves the uniqueness of the solution and
therefore how the reference state can be determined.
This analysis can be performed on the power trace assigned to a piece of
code while manipulating known and varying data. If we assume that the han-
dled data is the result of a XOR operation between a secret key word Kand a
known message word M,D=KM, the procedure described above, i-e ex-
haustive search on Rand correlation test, should lead to KRassociated with
max(ρW H ). Indeed if a correlation occurs when Mis handled with respect to
R1, another has to occur later on, when MKis manipulated in turn, possibly
with a different reference state R2(in fact with KR2since only Mis known).
For instance, when considering the first AddRoundKey function at the begin-
ning of the AES algorithm embedded on an 8-bit processor, it is obvious that
such a method leads to the whole key masked by the constant reference byte R2.
If R2is the same for all the key bytes, which is highly plausible, only 28possi-
bilities remain to be tested by exhaustive search to infer the entire key material.
This complementary brute force may be avoided if R2is determined by other
means or known to be always equal to 0 (on certain chips).
This attack is not restricted to the operation. It also applies to many
other operators often encountered in secret key cryptography. For instance, other
arithmetic, logical operations or look-up tables (LUT) can be treated in the
same manner by using H(LUT(M ⋆ K )R), where represents the involved
function i.e. , +, -, OR, AND, or whatever operation. Let’s notice that the
ambiguity between Kand KRis completely removed by the substitution
boxes encountered in secret key algorithms thanks to the non-linearity of the
corresponding LUT: this may require to exhaust both Kand R, but only once
for Rin most cases. To conduct an analysis in the best conditions, we emphasize
the benefit of correctly modeling the whole machine word that is actually handled
and its transition with respect to the reference state Rwhich is to be determined
as an unknown of the problem.
5 Estimation
In a real case with a set of Npower curves Wiand Nassociated random data
words Mi, for a given reference state Rthe known data words produce a set of
Npredicted Hamming distances Hi,R =H(MiR). An estimate ˆρW H of the
correlation factor ρW H is given by the following formula:
ˆρW H (R) = NPWiHi,R PWiPHi,R
pNPW2
i(PWi)2qNPH2
i,R (PHi,R)2
where the summations are taken over the Nsamples (i= 1, N ) at each time
step within the power traces Wi(t).
It is theoretically difficult to compute the variance of the estimator ˆρW H
with respect to the number of available samples N. In practice a few hundred
experiments suffice to provide a workable estimate of the correlation factor. N
has to be increased with the model variance m/4 (higher on a 32-bit architecture)
and in presence of measurement noise level obviously. Next results will show that
this is more than necessary for conducting reliable tests. The reader is referred
to [5] for further discussion about the estimation on experimental data and
optimality issues. It is shown that this approach can be seen as a maximum
likelihood model fitting procedure when Ris exhausted to maximize ˆρWH .
6 Experimental Results
This section aims at confronting the leakage model to real experiments. General
rules of behavior are derived from the analysis of various chips for secure devices
conducted during the passed years.
Our first experience was performed onto a basic XOR algorithm implemented
in a 8-bit chip known for leaking information (more suitable for didactic pur-
pose). The sequence of instructions was simply the following:
load a byte D1into the accumulator
XOR D1with a constant D2
store the result from the accumulator to a destination memory cell.
The program was executed 256 times with D1varying from 0 to 255. As
displayed on Figure 1, two significant correlation peaks were obtained with two
different reference states: the first one being the address of D1, the second one the
opcode of the XOR instruction. These curves bring the experimental evidence
of leakage principles that previous works just hint at, without going into more
detail [16, 8, 6, 17]. They illustrate the most general case of a transfer sequence
Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:
for varying data (0-255), model array and measurement array taken at the time of the
second correlation peak.
on a common bus. The address of a data word is transmitted just before its
value that is in turn immediately followed by the opcode of the next instruction
which is fetched. Such a behavior can be observed on a wide variety of chips
even those implementing 16 or 32-bit architectures. Correlation rates ranging
from 60% to more than 90% can often be obtained. Figure 2 shows an example
of partial correlation on a 32-bit architecture: when only 4 bits are predicted
among 32, the correlation loss is in about the ratio 8 which is consistent with
the displayed correlations.
This sort of results can be observed on various technologies and implemen-
tations. Nevertheless the following restrictions have to be mentioned:
Sometimes the reference state is systematically 0. This can be assigned to the
so-called pre-charged logic where the bus is cleared between each transferred
value. Another possible reason is that complex architectures implement sep-
arated busses for data and addresses, that may prohibit certain transitions.
In all those cases the Hamming weight model is recovered as a particular
case of the more general Hamming distance model.
Fig. 2. Two correlation peaks for full word (32 bits) and partial (4 bits) predictions.
According to theory the 20% peak should rather be around 26%.
The sequence of correlation peaks may sometimes be blurred or spread over
the time in presence of a pipe line.
Some recent technologies implement hardware security features designed to
impede statistical power analysis. These countermeasures offer various levels
of efficiencies going from the most naive and easy to bypass, to the most
effective which merely cancel any data dependency.
There are different kinds of countermeasures which are completely similar to
those designed against DPA.
Some of them consist in introducing desynchronization in the execution of
the process so that the curves are not aligned anymore within a same acqui-
sition set. For that purpose there exist various techniques such as fake cycles
insertion, unstable clocking or random delays [6, 18]. In certain cases their
effect can be corrected by applying appropriate signal processing.
Other countermeasures consist in blurring the power traces with additional
noise or filtering circuitry [19]. Sometimes they can be bypassed by curves
selection and/or averaging or by using another side channel such as electro-
magnetic radiation [9, 1].
The data can also be ciphered dynamically during a process by hardware
(such as bus encryption) or software means (data masking with a random
[11, 7, 20, 10]), so that the handled variables become unpredictable: then no
correlation can be expected anymore. In theory sophisticated attacks such
as higher order analysis [15] can overcome the data masking method; but
they are easy to thwart in practice by using desynchronization for instance.
Indeed, if implemented alone, none of these countermeasures can be considered
as absolutely secure against statistical analyses. They just increase the amount
of effort and level of expertise required to achieve an attack. However combined
defenses, implementing at least two of these countermeasures, prove to be very
efficient and practically dissuasive. The state of the art of countermeasures in
the design of tamper resistant devices has made big advances in the recent years.
It is now admitted that security requirements include sound implementations as
much as robust cryptographic schemes.
7 Comparison with DPA
This section addresses the comparison of the proposed CPA method with Dif-
ferential Power Analysis (DPA). It refers to the former works done by Messerges
et al. [16, 17] who formalized the ideas previously suggested by Kocher [12, 13].
A critical study is proposed in [5].
7.1 Practical problems with DPA: the “ghost peaks”
We just consider hereafter the practical implementation of DPA against the DES
substitutions (1st round). In fact this well-known attack works quite well only
if the following assumptions are fulfilled:
1. Word space assumption: within the word hosting the predicted bit, the con-
tribution of the non-targeted bits is independent of the targeted bit value.
Their average influence in the curves pack of 0 is the same as that in the
curves pack of 1. So the attacker does not need to care about these bits.
2. Guess space assumption: the predicted value of the targeted bit for any
wrong sub-key guess does not depend on the value associated to the correct
guess.
3. Time space assumption: the power consumption Wdoes not depend on the
value of the targeted bit except when it is explicitly handled.
But when confronted to the experience, the attack comes up against the
following facts.
Fact A. For the correct guess, DPA peaks appear also when the targeted
bit is not explicitly handled. This is worth being noticed albeit not really
embarrassing. However this contradicts the third assumption.
Fact B. Some DPA peaks also appear for wrong guesses: they are called
“ghost peaks”. This fact is more problematic for making a sound decision
and comes in contradiction with the second assumption.
Fact C. The true DPA peak given by the right guess may be smaller than
some ghost peaks, and even null or negative! This seems somewhat amazing
and quite confusing for an attacker. The reasons must be searched for inside
the crudeness of the optimistic first assumption.
7.2 The ”ghost peaks” explanation
With the help of a thorough analysis of substitution boxes and the Hamming
distance model it is now possible to explain the observed facts and show how
wrong the basic assumptions of DPA can be.
Fact A. As a matter of fact some data handled along the algorithm may be par-
tially correlated with the targeted bit. This is not that surprising when looking
at the structure of the DES. A bit taken from the output nibble of a SBox has
a lifetime lasting at least until the end of the round (and beyond if the left part
of the IP output does not vary too much). A DPA peak rises each time this bit
and its 3 peer bits undergo the following P permutation since they all belong to
the same machine word.
Fact B. The reason why wrong guesses may generate DPA peaks is that the
distributions of an SBox output bit for two different guesses are deterministic
and so possibly partially correlated. The following example is very convincing
about that point. Let’s consider the leftmost bit of the fifth SBox of the DES
when the input data Dvaries from 0 to 63 and combined with two different
sub-keys : MSB(SBox5(D0x00)) and MSB(SBox5(D0x36)). Both series of
bits are respectively listed hereafter, with their bitwise XOR on the third line:
1101101010010110001001011001001110101001011011010101001000101101
1001101011010110001001011101001010101101011010010101001000111001
0100000001000000000000000100000100000100000001000000000000010100
The third line contains 8 set bits, revealing only eight errors of prediction among
64. This example shows that a wrong guess, say 0, can provide a good prediction
at a rate of 56/64, that is not that far from the correct one 0x36. The result would
be equivalent for any other pair of sub-keys Kand K0x36. Consequently a
substantial concurrent DPA peak will appear at the same location than the right
one. The weakness of the contrast will disturb the guesses ranking especially in
presence of high SN R.
Fact C. DPA implicitly considers the word bits carried along with the targeted
bit as uniformly distributed and independent from the targeted one. This is
erroneous because implementation introduces a deterministic link between their
values. Their asymmetric contribution may affect the height and sign of a DPA
peak. This may influence the analysis on the one hand by shrinking relevant
peaks, on the other hand by enhancing meaningless ones. There exists a well
known trick to bypass this difficulty as mentioned in [4]. It consists in shifting
the DPA attacks a little bit further in the processing and perform the prediction
just after the end of the first round when the right part of the data (32 bits) is
XORed with the left part of the IP output. As the message is chosen freely, this
represents an opportunity to re-balance the loss of randomness by bringing new
refreshed random data. But this does not fix Fact B in a general case .
To get rid of these ambiguities the model based approach aims at taking
the whole information into account. This requires to introduce the notion of
algorithmic implementation that DPA assumptions completely occult.
When considering the substitution boxes of the DES, it cannot be avoided
to remind that the output values are 4-bit values. Although these 4 bits are in
principle equivalent as DPA selection bits, they live together with 4 other bits in
the context of an 8-bit microprocessor. Efficient implementations use to exploit
those 4 bits to save some storage space in constrained environments like smart
card chips. A trick referred to as “SBox compression” consists in storing 2 SBox
values within a same byte. Thus the required space is halved. There are different
ways to implement this. Let’s consider for instance the 2 first boxes: instead of
allocating 2 different arrays, it is more efficient to build up the following look-
up table: LUT12(k) = SBox1(k)kSBox2(k). For a given input index k, the
array byte contains the values of two neighboring boxes. Then according to the
Hamming distance consumption model, the power trace should vary like:
H(LUT12(D1K1)R1) when computing SBox1.
H(LUT12(D2K2)R2) when computing SBox2.
If the values are bind like this, their respective bits cannot be considered as
independent anymore. To prove this assertion we have conducted an experiment
on a real 8-bit implementation that was not protected by any DPA countermea-
sures. Working in a “white box” mode, the model parameters had been previ-
ously calibrated with respect to the measured consumption traces. The reference
state R= 0xB7 had been identified as the Opcode of an instruction transferring
the content of the accumulator to RAM using direct addressing. The model fitted
the experimental data samples quite well; their correlation factor even reached
97%. So we were able to simulate the real consumption of the Sbox output with
a high accuracy. Then the study consisted in applying a classical single bit DPA
to the output of SBox1in parallel on both sets of 200 data samples: the measured
and the simulated power consumptions.
As figure 3 shows, the simulated and experimental DPA biases match par-
ticularly well. One can notice the following points:
The 4 output bits are far from being equivalent.
The polarity of the peak associated to the correct guess 24 depends on the
polarity of the reference state. As R= 0xB7 its leftmost nibble aligned with
SBox1is 0xB = ’1011’ and only the selection bit 2 (counted from the left)
results in a positive peak whereas the 3 others undergo a transition from 1
to 0, leading to a negative peak.
In addition this bit is a somewhat lucky bit because when it is used as selec-
tion bit only guess 50 competes with the right sub-key. This is a particular
favorable case occurring here on SBox1, partly due to the set of 200 used
messages. It cannot be extrapolated to other boxes.
The dispersion of the DPA bias over the guesses is quite confuse (see bit 4).
The quality of the modeling proves that those facts cannot be incriminated to
the number of acquisitions. Increasing it much higher than 200 does not help:
the level of the peaks with respect to the guesses does not evolve and converges
to the same ranking. This particular counter-example proves that the ambiguity
of DPA does not lie in imperfect estimation but in wrong basic hypotheses.
Fig. 3. DPA biases on SBox1versus guesses for selection bits 1, 2, 3 and 4, on modeled
and experimental data; the correct guess is 24.
7.3 Results of Model Based CPA
For comparison the table hereafter provides the ranking of the 6 first guesses
sorted by decreasing correlation rates. This result is obtained with as few as
only 40 curves! The full key is 11 22 33 44 55 66 77 88 in hexadecimal format
and the corresponding sub-keys at the first round are 24, 19, 8, 8, 5, 50, 43, 2 in
decimal representation.
SBox1SBox2SBox3SBox4SBox5SBox6SBox7SBox8
K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax
24 92% 19 90% 8 87% 8 88% 5 91% 50 92% 43 89% 2 89%
48 74% 18 77% 18 69% 44 67% 32 71% 25 71% 42 76% 28 77%
01 74% 57 70% 05 68% 49 67% 25 70% 05 70% 52 70% 61 76%
33 74% 02 70% 22 66% 02 66% 34 69% 54 70% 38 69% 41 72%
15 74% 12 68% 58 66% 29 66% 61 67% 29 69% 0 69% 37 70%
06 74% 13 67% 43 65% 37 65% 37 67% 53 67% 30 68% 15 69%
This table shows that the correct guess always stands out with a good contrast.
Therefore a sound decision can be made without any ambiguity despite a rough
estimation of ρmax.
A similar attack has also been conducted on a 32-bit implementation, in a
white box mode with a perfect knowledge of the implemented substitution tables
and the reference state which was 0. The key was 7C A1 10 45 4A 1A 6E 57 in
hexadecimal format and the related sub-keys at the 1st round were 28, 12, 43,
0, 15, 60, 5, 38 in decimal representation. The number of curves is 100. As next
table shows, the contrast is good between the correct and the most competing
wrong guess (around 40% on boxes 1 to 4). The correlation rate is not that high
on boxes 5 to 8, definitely because of partial and imperfect modeling, but it
proves to remain exploitable and thus a robust indicator. When the number of
bits per machine word is greater, the contrast between the guesses is relatively
enhanced, but finding the right model could be more difficult in a black box
mode.
SBox1SBox2SBox3SBox4SBox5SBox6SBox7SBox8
K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax
28 77% 12 69% 43 73% 0 82% 15 52% 60 51% 5 51% 38 47%
19 36% 27 29% 40 43% 29 43% 03 33% 10 34% 15 40% 05 29%
42 35% 24 27% 36 35% 20 35% 58 30% 58 33% 6 29% 55 26%
61 31% 58 27% 06 33% 60 32% 10 30% 18 31% 12 29% 39 25%
8 Conclusion
Our experience on a large set of smart card chips over the last years has con-
vinced us on the validity of the Hamming distance model and the advantages of
the CPA method against DPA, in terms of efficiency, robustness and number of
experiments. An important and reassuring conclusion is that all the countermea-
sures designed against DPA offer the same defensive efficiency against the model
based CPA attack. This is not that surprising since those countermeasures aim
at undermining the common prerequisites that both approaches are based on:
side-channel observability and intermediate variable predictability.
The main drawback of CPA regards the characterization of the leakage model
parameters. As it is more demanding than DPA, the method may seem more
difficult to implement. However it may be objected that:
A statistical power analysis of any kind is never conducted blindly without
any preliminary reverse engineering (process identification, bit tracing): this
is the opportunity to quantify the leakage rate by CPA on known data.
DPA requires more sample curves anyway since all the unpredicted data bits
penalize the signal to noise ratio (see [5]).
If DPA fails by lack of implementation knowledge (increasing the number of
curves does not necessarily help), we have shown how to infer a part of this
information without excessive efforts: for instance the reference state is to
be found by exhaustive search only once in general.
There exists many situations where the implementation variants (like SBox
implementation in DES) are not so numerous because of operational con-
straints.
If part of the model cannot be inferred (SBox implementation in DES, hard-
ware co-processor), partial correlation with the remainder may still provide
exploitable indications.
Eventually DPA remains relevant in case of very special architectures for which
the model may be completely out of reach, like in certain hard wired co-processors.
References
1. D. Agrawal, B. Archambeault, J.R. Rao, and P. Rohatgi. The EM side channel(s):
Attacks and assessment methodologies. In Cryptographic Hardware and Embedded
Systems — CHES 2002, LNCS 2523, pp. 29–45, Springer-Verlag, 2002. See also
http://www.research.ibm.com.intsec/emf-paper.ps.
2. M.L. Akkar, R. B´evan, P. Dischamp, and D. Moyart. Power analysis, what is
now possible... In Advances in Cryptology — ASIACRYPT 2000, LNCS 1976, pp.
489–502, Springer-Verlag, 2000.
3. M.L. Akkar and C. Giraud. An Implementation of DES and AES secure against
some attacks. In Cryptographic Hardware and Embedded Systems — CHES 2001,
LNCS 2162 pp. 309–318, Springer-Verlag, 2001.
4. R. B´evan and R. Knudsen. Ways to enhance differential power analysis. In In-
formation Security and Cryptology — ICISC 2002, LNCS 2587, pp. 327–342,
Springer-Verlag, 2002.
5. E. Brier, C. Clavier, and F. Olivier. Optimal statistical power analysis.
http://eprint.iacr.org/2003/152/.
6. C. Clavier, J.-S. Coron, and N. Dabbous. Differential power analysis in the pres-
ence of hardware countermeasures. In Cryptographic Hardware and Embedded Sys-
tems — CHES 2000, LNCS 1965, pp. 252–263, Springer-Verlag, 2000.
7. J.-S. Coron and L. Goubin. On Boolean and arithmetic masking against differen-
tial power analysis. In Cryptographic Hardware and Embedded Systems — CHES
2000, LNCS 1965, pp. 231–237, Springer-Verlag, 2000.
8. J.-S. Coron, P. Kocher, and D. Naccache. Statistics and secret leakage. In Finan-
cial Cryptography (FC 2000), LNCS 1972, pp. 157–173, Springer-Verlag, 2001.
9. K. Gandolfi, C. Mourtel, and F. Olivier. Electromagnetic attacks: Concrete re-
sults. In Cryptographic Hardware and Embedded Systems — CHES 2001, LNCS
2162, pp. 252–261, Springer-Verlag, 2001.
10. J. Goli¸c and C. Tymen. Multiplicative masking and power analysis of AES. In
Cryptographic Hardware and Embedded Systems — CHES 2002, LNCS 2523, pp.
198–212, Springer-Verlag, 2002.
11. L. Goubin and J. Patarin. DES and differential power analysis. In Cryptographic
Hardware and Embedded Systems (CHES ’99), LNCS 1717, pp. 158–172, Springer-
Verlag, 1999.
12. P. Kocher, J. Jaffe, and B. Jun. Introduction to differential power analysis and
related attacks. http://www.cryptography.com.
13. P. Kocher, J. Jaffe, and B. Jun. Differential power analysis. In Advances in Cryp-
tology CRYPTO ’99, LNCS 1666, pp. 388–397, Springer-Verlag, 1999.
14. R. Mayer-Sommer. Smartly analysing the simplicity and the power of simple
power analysis on smartcards. In Cryptographic Hardware and Embedded Systems
— CHES 2000. LNCS 1965, pp. 78–92, Springer-Verlag, 2000.
15. T.S. Messerges. Using second-order power analysis to attack DPA resistant soft-
ware. In Cryptographic Hardware and Embedded Systems — CHES 2000. LNCS
1965, pp. 238–252, Springer-Verlag, 2000.
16. T. Messerges, E. Dabbish, and R. Sloan. Investigation of power analysis at-
tacks on smartcards. In Usenix Workshop on Smartcard Technology 1999.
http://www.usenix.org.
17. T. Messerges, E. Dabbish, and R. Sloan. Examining smart-card security under
the threat of power analysis attacks. IEEE Transactions on Computers, 51(5):
541–552, May 2002.
18. E. Oswald. On Side-Channel Attacks and the Application of Algorithmic Coun-
termeasures. PhD Thesis, Faculty of Science of the University of Technology Graz
(IAIK-TUG), Austria, May 2003.
19. A. Shamir. Protecting smart cards from passive power analysis with detached
power supplies. In Cryptographic Hardware and Embedded Systems — CHES
2000. LNCS 1965, pp. 71–77, Springer-Verlag, 2000.
20. E. Trichina, D. De Seta, and L. Germani. Simplified adaptive multiplicative mask-
ing for AES. In Cryptographic Hardware and Embedded Systems — CHES 2002,
LNCS 2523, pp. 187–197, Springer-Verlag, 2002.
... The first is power analysis attacks that analyze the amount of electric power consumed by a device while the device is legitimately processing intermediate values related to secret information [6]. Since intermediate values affect the device's power distribution network, leading to voltage drop, it is possible to extract secret information by analyzing power traces [7]. Electromagnetic (EM) radiation from the device is also exploitable due to the fundamental laws of electromagnetism [8]. ...
... Correlation Power Analysis (CPA) is a statistical method that exploits the correlation between hypothetical power consumption and measured power traces [7]. The CPA attacks primarily target symmetric key cryptosystems such as Advanced Encryption Standard (AES) [25] because it requires multiple pairs of power trace and intermediate value. ...
Article
Full-text available
Over the past decade, decentralized cryptocurrencies have received attention in industry and academia. Hardware wallets are dedicated devices that manage cryptocurrencies safely without entrusting cryptographic keys to a third party. Side-channel attacks have been widely studied in cryptanalysis and have already been proven threatening, but analysis on hardware wallets still needs to be researched. Although the previous work demonstrated several side-channel vulnerabilities, their attacks require a finely controlled environment or a learning phase of target devices’ physical properties before the attacks. This paper proposes a side-channel attack on hardware wallets extracting private keys. The proposed attack needs a single power trace measured when wallets process elliptic curve scalar multiplication with private keys. Our attack is reasonable since we do not damage the device under attack and do not target a specific device but an algorithm; it is widely applicable to wallets using that algorithm or analogous ones. It also presents the attack results conducted with three datasets: simulation, ChipWhisperer, and actual dataset collected from the Trezor Model One, the first and representative hardware wallets which comply with the de facto standard of hardware wallets.
... Even though an important goal of a lightweight block cipher is to achieve effective encryption under the condition of limited computational resources [6,7], its most important and core objective is the security of the cryptography. At present, there are many kinds of attacks [8][9][10] against block ciphers, including differential analysis, integral analysis, linear analysis, algebraic analysis, differential fault analysis, and algebraic fault analysis. As an efficient key-recovery method, fault attack [11,12] technology has been widely used since it was proposed. ...
... The AFA of GIFT-64 Input: P, num, R f ault , N Output: T j ,T ave 1 for i = 0; i < r;i + + do 2 Generate the equation set of Key-Expansion; 3 Generate the equation set of correct encryption 4 end 5 C ← Enc(P, K ); 6 for j = 1; j < num; j + + do 7 C * j ← Enc(P * j , K ) 8 end 9 Generate the equation of P,C,C * j ; 10 for j = 1; j < num; j + + do 11 for R = R f ault ; i < 32;i + + do 12 Generate the equation sets of fault-injection encryption Figure 2 shows the spread of single-bit or nibble fault injected into one 25th round's S-box. It can be found that by the time the fault spread to 28th's output, a single fault injected in one 25th round's S-box has affected all the bits. ...
Article
Full-text available
Nowadays, with the development of the Internet of Things and information security technologies, lightweight block ciphers are gradually being widely used. As a side-channel attack method, algebraic fault analysis has received attention from experts and scholars since its introduction. The familiar nonlinear operation in lightweight block ciphers is the S-box substitution, and the performance index of the S-box directly determines the security strength of the cipher. To further improve the efficiency of algebraic fault analysis, this paper proposes a method to rewrite the algebraic equations of S-box substitution by decomposing the original cubic S-boxes into two quadratic S-boxes. The results show that this method is significantly effective compared to the original method in GIFT-64 and SKINNY-64, especially in the SKINNY-64 block cipher, where the average solving time is reduced by several hundred times in the best case with the same samples. At the same time, our best results show that s single-bit fault injection is enough to recover the master key of SKINNY-64. In addition, the PRESENT-64 block cipher is also studied in this paper, and the results show that the method can also improve efficiency significantly. When the location of the single-bit fault is unknown, using the S-box decomposition method for SKINNY-64 can also significantly improve the solving success rate, reduce the number of faults, and speed up the solving.
Chapter
FPGA-based S-box chaotic masking needs to call the multiplier IP core during multiplication, which will take up a lot of resources. Therefore, this paper proposes a new four-dimensional discrete chaotic system for S-box masking. The four-dimensional discrete chaos can avoid calling the multiplier IP core by limiting the elements in the polynomial coefficient matrix for shift operation. The new four-dimensional discrete chaotic system has higher Lyapunov exponent and better randomness. In terms of hardware resource consumption, although the new chaotic system has a higher dimension than the classical chaotic system Logistic map, Tent map and Henon map, it requires almost the same or even less resources as the one-dimensional chaotic system. The experimental results show that the new four-dimensional discrete chaotic system is more suitable for S-box masking in FPGA.KeywordsDiscrete chaotic systemFPGAMaskS-box
Chapter
Side-channel analysis (SCA) is a well-known hardware vulnerability that utilizes physical information leakage to retrieve the key of a cryptographic implementation. With relatively little equipment and a short length of time, an adversary can easily obtain the secret key by analyzing the side-channel parameters. This chapter discusses the power side-channel analysis at various stages of the design flow. The emergence of side-channel attacks and the recent explosion in the number of IoT devices have exposed the flaws in traditional design methodologies. With the proliferation of electronic devices and in order to satisfy more aggressive time-to-market requirements, computer-aided design (CAD) methods must be optimized in order to make the system design process more efficient and error-free. Chip design industries need to create systems that are difficult to exploit using side-channel approaches by incorporating side-channel vulnerability assessment methodologies into CAD tools. Several strategies are available for introducing power side-channel assessment into the design process, including one measure for the enhanced evaluation as well as one machine learning approach for measuring and identifying power SCA vulnerabilities more quickly than conventional methods.KeywordsSide-channel leakage analysisTest vector leakage assessmentPower side channelPre-silicon power side-channel evaluationMachine learning-based side-channel leakage detectionPower side-channel leakage assessment at RTL
Chapter
Due to the persistency of hardware vulnerabilities, they are generally considered harder to resolve than software vulnerabilities. Accordingly, it is crucial to assess the system’s security and address the vulnerabilities during the early phases of design, such as the Register-Transfer Level (RTL) and gate level. Existing security assessment techniques primarily focus on two areas. Firstly, they examine the security of Intellectual Property (IP) blocks separately. As a second aspect, they aim to assess the security against individual threats, taking into account that the threats are orthogonal. We argue that IP-level security assessments are insufficient. In the end, each IP is surrounded by other IPs connected via glue logic and shared/private buses on a platform, such as a system-on-chip (SoC). It is, therefore, necessary to develop a methodology for assessing platform-level security that considers both IP-level security and the introduction of new parameters during platform integration. Furthermore, threats are not always orthogonal. Enhancing security against one threat may hurt the security against other threats. Hence, to build a secure platform, we must first answer the following questions: what additional parameters are introduced during the platform integration, what is the impact of these parameters on security, how do we define and characterize it, in what ways do the mitigation techniques of one threat affect those of another, etc. A primary objective of this chapter is to answer these important questions and propose techniques for quantifiable assurance by quantitatively estimating and measuring the security of a platform prior to its fabrication. We also discuss the concept of security optimization and the challenges for future research.KeywordsSecurity metricsSecurity verificationPlatform-level securitySecurity estimationIP-level security
Conference Paper
A new kind of cryptanalytic attacks, targeted directly at the weaknesses of a cryptographic algorithm's physical implementation, has recently attracted great attention. Examples are timing, glitch, or power-analysis attacks. Whereas in so-called simple power analysis (SPA for short) only the power consumption of the device is analyzed, differential power analysis (DPA) additionally requires knowledge of ciphertext outputs and is thus more costly. Previous investigations have indicated that SPA is little threatening and moreover easy to prevent, leaving only DPA as a serious menace to smartcard integrity. We show, however, that with careful experimental technique, SPA allows for extracting sensitive information easily, requiring only a single power-consumption graph. This even holds with respect to basic instructions such as register moves, which have previously not been considered critical. Our results suggest that SPA is an effective and easily implementable attack and, due to its simplicity, potentially a more serious threat than DPA in many real applications.
Article
We present a systematic investigation of the leakage of compromising information via electro- magnetic (EM) emanations from chipcards and other devices. This information leakage differs substantially from and is more powerful than the leakage from other conventional side-channels such as timing and power. EM emanations are shown to consist of a multiplicity of compro- mising signals, each leaking somewhat different information. Our experimental results confirm that these signals could individually contain enough leakage to break cryptographic implementa- tions and to defeat countermeasures against other side-channels such as power. Using techniques from Signal Detection Theory, we also show that generalized and far more devastating attacks can be constructed from an effective pooling of leakages from multiple signals derived from EM emanations. The magnitude of EM exposure demands a leakage assessment methodology whose correctness can be rigorously proved. We define a model that completely and quantitatively bounds the information leaked from multiple (or all available) EM side-channel signals in CMOS devices and use that to develop a practical assessment methodology for devices such as chipcards.
Article
Since the announcement of the Dieren tial Power Analy- sis (DPA) by Paul Kocher and al., several countermeasures were pro- posed in order to protect software implementations of cryptographic al- gorithms. In an attempt to reduce the resulting memory and execution time overhead, Thomas Messerges recently proposed a general method that \masks" all the intermediate data. This masking strategy is possible if all the fundamental operations used in a given algorithm can be rewritten with masked input data, giving masked output data. This is easily seen to be the case in classical algo- rithms such as DES or RSA. However, for algorithms that combine Boolean and arithmetic functions, such as IDEA or several of the AES candidates, two dieren t kinds of masking have to be used. There is thus a need for a method to convert back and forth between Boolean masking and arithmetic masking. In the present paper, we show that the 'BooleanToArithmetic' algorithm proposed by T. Messerges is not sucien t to prevent Dieren tial Power Analysis. In a similar way, the 'ArithmeticToBoolean' algorithm is not secure either.
Conference Paper
Cryptosystem designers frequently assume that secrets will be manipulated in closed, reliable computing environments. Unfortunately, actual computers and microchips leak information about the operations they process. This paper examines specific methods for analyzing power consumption measurements to find secret keys from tamper resistant devices. We also discuss approaches for building cryptosystems that can operate securely in existing hardware that leaks information.Keywordsdifferential power analysisDPASPAcryptanalysisDES
Conference Paper
Since Power Analysis on smart-cards was introduced by Paul Kocher (KJJ98), the validity of the model used for smart-cards has not been given much attention. In this paper, we first describe and ana- lyze some different possible models. Then we apply these models to real components and clearly define what can be detected by power analysis (simple, differential, code reverse engineering...). We also study, from a statistical point of view, some new ideas to exploit these models to at- tack the card by power analysis. Finally we apply these ideas to set up real attacks on cryptographic algorithms or enhance existing ones.
Conference Paper
The silicon industry has lately been focusing on side chan- nel attacks, that is attacks that exploit information that leaks from the physical devices. Although dieren t countermeasures to thwart these at- tacks have been proposed and implemented in general, such protections do not make attacks infeasible, but increase the attacker's experimental (data acquisition) and computational (data processing) workload beyond reasonable limits. This paper examines dieren t ways to attack devices featuring random process interrupts and noisy power consumption.