Conference PaperPDF Available

Correlation Power Analysis with a Leakage Model

Authors:

Abstract and Figures

A classical model is used for the power consumption of cryptographic devices. It is based on the Hamming distance of the data handled with regard to an unknown but constant reference state. Once validated experimentally it allows an optimal attack to be derived called Correlation Power Analysis. It also explains the defects of former approaches such as Differential Power Analysis. Keywords: Correlation factor, CPA, DPA, Hamming distance, power analysis, DES, AES, secure cryptographic device, side channel.
Content may be subject to copyright.
Correlation Power Analysis with a Leakage
Model
Eric Brier, Christophe Clavier, and Francis Olivier
Gemplus Card International, France
Security Technology Department
{eric.brier, christophe.clavier, francis.olivier}@gemplus.com
Abstract. A classical model is used for the power consumption of cryp-
tographic devices. It is based on the Hamming distance of the data han-
dled with regard to an unknown but constant reference state. Once val-
idated experimentally it allows an optimal attack to be derived called
Correlation Power Analysis. It also explains the defects of former ap-
proaches such as Differential Power Analysis.
Keywords: Correlation factor, CPA, DPA, Hamming distance, power
analysis, DES, AES, secure cryptographic device, side channel.
1 Introduction
In the scope of statistical power analysis against cryptographic devices, two
historical trends can be observed. The first one is the well known differential
power analysis (DPA) introduced by Paul Kocher [12, 13] and formalized by
Thomas Messerges et al. [16]. The second one has been suggested in various
papers [8, 14, 18] and proposed to use the correlation factor b etween the power
samples and the Hamming weight of the handled data. Both approaches exhibit
some limitations due to unrealistic assumptions and model imperfections that
will be examined more thoroughly in this paper. This work follows previous
studies aiming at either improving the Hamming weight model [2], or enhancing
the DPA itself by various means [6, 4].
The proposed approach is based on the Hamming distance model which can
be seen as a generalization of the Hamming weight model. All its basic assump-
tions were already mentioned in various pap ers from year 2000 [16, 8, 6, 2]. But
they remained allusive as possible explanation of DPA defects and never leaded
to any complete and convenient exploitation. Our experimental work is a synthe-
sis of those former approaches in order to give a full insight on the data leakage.
Following [8, 14, 18] we propose to use the correlation power analysis (CPA) to
identify the parameters of the leakage model. Then we show that sound and
efficient attacks can be conducted against unprotected implementations of many
algorithms such as DES or AES. This study deliberately restricts itself to the
scope of secret key cryptography although it may be extended beyond.
This paper is organized as follows: Section 2 introduces the Hamming dis-
tance model and Section 3 proves the relevance of the correlation factor. The
model based correlation attack is described in Section 4 with the impact on the
model errors. Section 5 addresses the estimation problem and the experimental
results which validate the model are exposed in Section 6. Section 7 contains
the comparative study with DPA and addresses more specifically the so-called
“ghost peaks” problem encountered by those who have to deal with erroneous
conclusions when implementing classical DPA on the substitution boxes of the
DES first round: it is shown there how the proposed model explains many defects
of the DPA and how the correlation power analysis can help in conducting sound
attacks in optimal conditions. Our conclusion summarizes the advantages and
drawbacks of CPA versus DPA and reminds that countermeasures work against
both methods as well.
2 The Hamming Distance Consumption Model
Classically, most power analyses found in literature are based upon the Hamming
weight model [13, 16], that is the number of bits set in a data word. In a m-bit
microprocessor, binary data is coded D=Pm1
j=0 dj2j, with the bit values dj= 0
or 1. Its Hamming weight is simply the number of bits set to 1, H(D) = Pm1
j=0 dj.
Its integer values stand between 0 and m. If Dcontains mindependent and
uniformly distributed bits, the whole word has an average Hamming weight
µH=m/2 and a variance σ2
H=m/4.
It is generally assumed that the data leakage through the power side-channel
depends on the number of bits switching from one state to the other [6, 8] at a
given time. A microprocessor is modeled as a state-machine where transitions
from state to state are triggered by events such as the edges of a clock signal.
This seems relevant when looking at a logical elementary gate as implemented in
CMOS technology. The current consumed is related to the energy required to flip
the bits from one state to the next. It is composed of two main contributions: the
capacitor’s charge and the short circuit induced by the gate transition. Curiously,
this elementary behavior is commonly admitted but has never given rise to any
satisfactory model that is widely applicable. Only hardware designers are famil-
iar with simulation tools to foresee the current consumption of microelectronic
devices.
If the transition model is adopted, a basic question is posed: what is the refer-
ence state from which the bits are switched? We assume here that this reference
state is a constant machine word, R, which is unknown, but not necessarily
zero. It will always be the same if the same data manipulation always occurs at
the same time, although this assumes the absence of any desynchronizing effect.
Moreover, it is assumed that switching a bit from 0 to 1 or from 1 to 0 requires
the same amount of energy and that all the machine bits handled at a given
time are perfectly balanced and consume the same.
These restrictive assumptions are quite realistic and affordable without any
thorough knowledge of microelectronic devices. They lead to a convenient ex-
pression for the leakage model. Indeed the number of flipping bits to go from R
to Dis described by H(DR) also called the Hamming distance between D
and R. This statement encloses the Hamming weight model which assumes that
R= 0. If Dis a uniform random variable, so is DR, and H(DR) has the
same mean m/2 and variance m/4 as H(D).
We also assume a linear relationship between the current consumption and
H(DR). This can be seen as a limitation but considering a chip as a large set of
elementary electrical components, this linear model fits reality quite well. It does
not represent the entire consumption of a chip but only the data dependent part.
This does not seem unrealistic because the bus lines are usually considered as
the most consuming elements within a micro-controller. All the remaining things
in the power consumption of a chip are assigned to a term denoted bwhich is
assumed independent from the other variables: b encloses offsets, time dependent
components and noise. Therefore the basic model for the data dependency can
be written:
W=aH(DR) + b
where ais a scalar gain between the Hamming distance and Wthe power con-
sumed.
3 The Linear Correlation Factor
A linear model implies some relationships between the variances of the different
terms considered as random variables: σ2
W=a2σ2
H+σ2
b. Classical statistics in-
troduce the correlation factor ρW H between the Hamming distance and the mea-
sured power to assess the linear model fitting rate. It is the covariance between
both random variables normalized by the product of their standard deviations.
Under the uncorrelated noise assumption, this definition leads to:
ρW H =cov(W, H)
σWσH
=H
σW
=H
pa2σ2
H+σ2
b
=am
pma2+ 4σ2
b
This equation complies with the well known property: 1ρWH +1: for a
perfect model the correlation factor tends to ±1 if the variance of noise tends to
0, the sign depending on the sign of the linear gain a. If the model applies only
to lindependent bits amongst m, a partial correlation still exists:
ρW Hl/m =al
pma2+ 4σ2
b
=ρW H rl
m
4 Secret Inference Based on Correlation Power Analysis
The relationships written above show that if the model is valid the correlation
factor is maximized when the noise variance is minimum. This means that ρW H
can help to determine the reference state R. Assume, just like in DPA, that a set
of known but randomly varying data Dand a set of related power consumption
Ware available. If the 2mpossible values of Rare scanned exhaustively they
can be ranked by the correlation factor they produce when combined with the
observation W. This is not that expensive when considering an 8-bit micro-
controller, the case with many of today’s smart cards, as only 256 values are to
be tested. On 32-bit architectures this exhaustive search cannot be applied as
such. But it is still possible to work with partial correlation or to introduce prior
knowledge.
Let Rbe the true reference and H=H(DR) the right prediction on the
Hamming distance. Let Rrepresent a candidate value and Hthe related model
H=H(DR). Assume a value of Rthat has kbits that differ from those
of R, then: H(RR) = k. Since bis independent from other variables, the
correlation test leads to (see [5]):
ρW H=cov(aH +b, H )
σWσ
H
=a
σW
cov(H, H)
σ
H
=ρW H ρHH =ρW H
m2k
m
This formula shows how the correlation factor is capable of rejecting wrong
candidates for R. For instance, if a single bit is wrong amongst an 8-bit word,
the correlation is reduced by 1/4. If all the bits are wrong, i-e R=¬R, then an
anti-correlation should be observed with ρW H=ρW H . In absolute value or if
the linear gain is assumed positive (a > 0), there cannot be any Rleading to a
higher correlation rate than R. This proves the uniqueness of the solution and
therefore how the reference state can be determined.
This analysis can be performed on the power trace assigned to a piece of
code while manipulating known and varying data. If we assume that the han-
dled data is the result of a XOR operation between a secret key word Kand a
known message word M,D=KM, the procedure described above, i-e ex-
haustive search on Rand correlation test, should lead to KRassociated with
max(ρW H ). Indeed if a correlation occurs when Mis handled with respect to
R1, another has to occur later on, when MKis manipulated in turn, possibly
with a different reference state R2(in fact with KR2since only Mis known).
For instance, when considering the first AddRoundKey function at the begin-
ning of the AES algorithm embedded on an 8-bit processor, it is obvious that
such a method leads to the whole key masked by the constant reference byte R2.
If R2is the same for all the key bytes, which is highly plausible, only 28possi-
bilities remain to be tested by exhaustive search to infer the entire key material.
This complementary brute force may be avoided if R2is determined by other
means or known to be always equal to 0 (on certain chips).
This attack is not restricted to the operation. It also applies to many
other operators often encountered in secret key cryptography. For instance, other
arithmetic, logical operations or look-up tables (LUT) can be treated in the
same manner by using H(LUT(M ⋆ K )R), where represents the involved
function i.e. , +, -, OR, AND, or whatever operation. Let’s notice that the
ambiguity between Kand KRis completely removed by the substitution
boxes encountered in secret key algorithms thanks to the non-linearity of the
corresponding LUT: this may require to exhaust both Kand R, but only once
for Rin most cases. To conduct an analysis in the best conditions, we emphasize
the benefit of correctly modeling the whole machine word that is actually handled
and its transition with respect to the reference state Rwhich is to be determined
as an unknown of the problem.
5 Estimation
In a real case with a set of Npower curves Wiand Nassociated random data
words Mi, for a given reference state Rthe known data words produce a set of
Npredicted Hamming distances Hi,R =H(MiR). An estimate ˆρW H of the
correlation factor ρW H is given by the following formula:
ˆρW H (R) = NPWiHi,R PWiPHi,R
pNPW2
i(PWi)2qNPH2
i,R (PHi,R)2
where the summations are taken over the Nsamples (i= 1, N ) at each time
step within the power traces Wi(t).
It is theoretically difficult to compute the variance of the estimator ˆρW H
with respect to the number of available samples N. In practice a few hundred
experiments suffice to provide a workable estimate of the correlation factor. N
has to be increased with the model variance m/4 (higher on a 32-bit architecture)
and in presence of measurement noise level obviously. Next results will show that
this is more than necessary for conducting reliable tests. The reader is referred
to [5] for further discussion about the estimation on experimental data and
optimality issues. It is shown that this approach can be seen as a maximum
likelihood model fitting procedure when Ris exhausted to maximize ˆρWH .
6 Experimental Results
This section aims at confronting the leakage model to real experiments. General
rules of behavior are derived from the analysis of various chips for secure devices
conducted during the passed years.
Our first experience was performed onto a basic XOR algorithm implemented
in a 8-bit chip known for leaking information (more suitable for didactic pur-
pose). The sequence of instructions was simply the following:
load a byte D1into the accumulator
XOR D1with a constant D2
store the result from the accumulator to a destination memory cell.
The program was executed 256 times with D1varying from 0 to 255. As
displayed on Figure 1, two significant correlation peaks were obtained with two
different reference states: the first one being the address of D1, the second one the
opcode of the XOR instruction. These curves bring the experimental evidence
of leakage principles that previous works just hint at, without going into more
detail [16, 8, 6, 17]. They illustrate the most general case of a transfer sequence
Fig. 1. Upper: consecutive correlation peaks for two different reference states. Lower:
for varying data (0-255), model array and measurement array taken at the time of the
second correlation peak.
on a common bus. The address of a data word is transmitted just before its
value that is in turn immediately followed by the opcode of the next instruction
which is fetched. Such a behavior can be observed on a wide variety of chips
even those implementing 16 or 32-bit architectures. Correlation rates ranging
from 60% to more than 90% can often be obtained. Figure 2 shows an example
of partial correlation on a 32-bit architecture: when only 4 bits are predicted
among 32, the correlation loss is in about the ratio 8 which is consistent with
the displayed correlations.
This sort of results can be observed on various technologies and implemen-
tations. Nevertheless the following restrictions have to be mentioned:
Sometimes the reference state is systematically 0. This can be assigned to the
so-called pre-charged logic where the bus is cleared between each transferred
value. Another possible reason is that complex architectures implement sep-
arated busses for data and addresses, that may prohibit certain transitions.
In all those cases the Hamming weight model is recovered as a particular
case of the more general Hamming distance model.
Fig. 2. Two correlation peaks for full word (32 bits) and partial (4 bits) predictions.
According to theory the 20% peak should rather be around 26%.
The sequence of correlation peaks may sometimes be blurred or spread over
the time in presence of a pipe line.
Some recent technologies implement hardware security features designed to
impede statistical power analysis. These countermeasures offer various levels
of efficiencies going from the most naive and easy to bypass, to the most
effective which merely cancel any data dependency.
There are different kinds of countermeasures which are completely similar to
those designed against DPA.
Some of them consist in introducing desynchronization in the execution of
the process so that the curves are not aligned anymore within a same acqui-
sition set. For that purpose there exist various techniques such as fake cycles
insertion, unstable clocking or random delays [6, 18]. In certain cases their
effect can be corrected by applying appropriate signal processing.
Other countermeasures consist in blurring the power traces with additional
noise or filtering circuitry [19]. Sometimes they can be bypassed by curves
selection and/or averaging or by using another side channel such as electro-
magnetic radiation [9, 1].
The data can also be ciphered dynamically during a process by hardware
(such as bus encryption) or software means (data masking with a random
[11, 7, 20, 10]), so that the handled variables become unpredictable: then no
correlation can be expected anymore. In theory sophisticated attacks such
as higher order analysis [15] can overcome the data masking method; but
they are easy to thwart in practice by using desynchronization for instance.
Indeed, if implemented alone, none of these countermeasures can be considered
as absolutely secure against statistical analyses. They just increase the amount
of effort and level of expertise required to achieve an attack. However combined
defenses, implementing at least two of these countermeasures, prove to be very
efficient and practically dissuasive. The state of the art of countermeasures in
the design of tamper resistant devices has made big advances in the recent years.
It is now admitted that security requirements include sound implementations as
much as robust cryptographic schemes.
7 Comparison with DPA
This section addresses the comparison of the proposed CPA method with Dif-
ferential Power Analysis (DPA). It refers to the former works done by Messerges
et al. [16, 17] who formalized the ideas previously suggested by Kocher [12, 13].
A critical study is proposed in [5].
7.1 Practical problems with DPA: the “ghost peaks”
We just consider hereafter the practical implementation of DPA against the DES
substitutions (1st round). In fact this well-known attack works quite well only
if the following assumptions are fulfilled:
1. Word space assumption: within the word hosting the predicted bit, the con-
tribution of the non-targeted bits is independent of the targeted bit value.
Their average influence in the curves pack of 0 is the same as that in the
curves pack of 1. So the attacker does not need to care about these bits.
2. Guess space assumption: the predicted value of the targeted bit for any
wrong sub-key guess does not depend on the value associated to the correct
guess.
3. Time space assumption: the power consumption Wdoes not depend on the
value of the targeted bit except when it is explicitly handled.
But when confronted to the experience, the attack comes up against the
following facts.
Fact A. For the correct guess, DPA peaks appear also when the targeted
bit is not explicitly handled. This is worth being noticed albeit not really
embarrassing. However this contradicts the third assumption.
Fact B. Some DPA peaks also appear for wrong guesses: they are called
“ghost peaks”. This fact is more problematic for making a sound decision
and comes in contradiction with the second assumption.
Fact C. The true DPA peak given by the right guess may be smaller than
some ghost peaks, and even null or negative! This seems somewhat amazing
and quite confusing for an attacker. The reasons must be searched for inside
the crudeness of the optimistic first assumption.
7.2 The ”ghost peaks” explanation
With the help of a thorough analysis of substitution boxes and the Hamming
distance model it is now possible to explain the observed facts and show how
wrong the basic assumptions of DPA can be.
Fact A. As a matter of fact some data handled along the algorithm may be par-
tially correlated with the targeted bit. This is not that surprising when looking
at the structure of the DES. A bit taken from the output nibble of a SBox has
a lifetime lasting at least until the end of the round (and beyond if the left part
of the IP output does not vary too much). A DPA peak rises each time this bit
and its 3 peer bits undergo the following P permutation since they all belong to
the same machine word.
Fact B. The reason why wrong guesses may generate DPA peaks is that the
distributions of an SBox output bit for two different guesses are deterministic
and so possibly partially correlated. The following example is very convincing
about that point. Let’s consider the leftmost bit of the fifth SBox of the DES
when the input data Dvaries from 0 to 63 and combined with two different
sub-keys : MSB(SBox5(D0x00)) and MSB(SBox5(D0x36)). Both series of
bits are respectively listed hereafter, with their bitwise XOR on the third line:
1101101010010110001001011001001110101001011011010101001000101101
1001101011010110001001011101001010101101011010010101001000111001
0100000001000000000000000100000100000100000001000000000000010100
The third line contains 8 set bits, revealing only eight errors of prediction among
64. This example shows that a wrong guess, say 0, can provide a good prediction
at a rate of 56/64, that is not that far from the correct one 0x36. The result would
be equivalent for any other pair of sub-keys Kand K0x36. Consequently a
substantial concurrent DPA peak will appear at the same location than the right
one. The weakness of the contrast will disturb the guesses ranking especially in
presence of high SN R.
Fact C. DPA implicitly considers the word bits carried along with the targeted
bit as uniformly distributed and independent from the targeted one. This is
erroneous because implementation introduces a deterministic link between their
values. Their asymmetric contribution may affect the height and sign of a DPA
peak. This may influence the analysis on the one hand by shrinking relevant
peaks, on the other hand by enhancing meaningless ones. There exists a well
known trick to bypass this difficulty as mentioned in [4]. It consists in shifting
the DPA attacks a little bit further in the processing and perform the prediction
just after the end of the first round when the right part of the data (32 bits) is
XORed with the left part of the IP output. As the message is chosen freely, this
represents an opportunity to re-balance the loss of randomness by bringing new
refreshed random data. But this does not fix Fact B in a general case .
To get rid of these ambiguities the model based approach aims at taking
the whole information into account. This requires to introduce the notion of
algorithmic implementation that DPA assumptions completely occult.
When considering the substitution boxes of the DES, it cannot be avoided
to remind that the output values are 4-bit values. Although these 4 bits are in
principle equivalent as DPA selection bits, they live together with 4 other bits in
the context of an 8-bit microprocessor. Efficient implementations use to exploit
those 4 bits to save some storage space in constrained environments like smart
card chips. A trick referred to as “SBox compression” consists in storing 2 SBox
values within a same byte. Thus the required space is halved. There are different
ways to implement this. Let’s consider for instance the 2 first boxes: instead of
allocating 2 different arrays, it is more efficient to build up the following look-
up table: LUT12(k) = SBox1(k)kSBox2(k). For a given input index k, the
array byte contains the values of two neighboring boxes. Then according to the
Hamming distance consumption model, the power trace should vary like:
H(LUT12(D1K1)R1) when computing SBox1.
H(LUT12(D2K2)R2) when computing SBox2.
If the values are bind like this, their respective bits cannot be considered as
independent anymore. To prove this assertion we have conducted an experiment
on a real 8-bit implementation that was not protected by any DPA countermea-
sures. Working in a “white box” mode, the model parameters had been previ-
ously calibrated with respect to the measured consumption traces. The reference
state R= 0xB7 had been identified as the Opcode of an instruction transferring
the content of the accumulator to RAM using direct addressing. The model fitted
the experimental data samples quite well; their correlation factor even reached
97%. So we were able to simulate the real consumption of the Sbox output with
a high accuracy. Then the study consisted in applying a classical single bit DPA
to the output of SBox1in parallel on both sets of 200 data samples: the measured
and the simulated power consumptions.
As figure 3 shows, the simulated and experimental DPA biases match par-
ticularly well. One can notice the following points:
The 4 output bits are far from being equivalent.
The polarity of the peak associated to the correct guess 24 depends on the
polarity of the reference state. As R= 0xB7 its leftmost nibble aligned with
SBox1is 0xB = ’1011’ and only the selection bit 2 (counted from the left)
results in a positive peak whereas the 3 others undergo a transition from 1
to 0, leading to a negative peak.
In addition this bit is a somewhat lucky bit because when it is used as selec-
tion bit only guess 50 competes with the right sub-key. This is a particular
favorable case occurring here on SBox1, partly due to the set of 200 used
messages. It cannot be extrapolated to other boxes.
The dispersion of the DPA bias over the guesses is quite confuse (see bit 4).
The quality of the modeling proves that those facts cannot be incriminated to
the number of acquisitions. Increasing it much higher than 200 does not help:
the level of the peaks with respect to the guesses does not evolve and converges
to the same ranking. This particular counter-example proves that the ambiguity
of DPA does not lie in imperfect estimation but in wrong basic hypotheses.
Fig. 3. DPA biases on SBox1versus guesses for selection bits 1, 2, 3 and 4, on modeled
and experimental data; the correct guess is 24.
7.3 Results of Model Based CPA
For comparison the table hereafter provides the ranking of the 6 first guesses
sorted by decreasing correlation rates. This result is obtained with as few as
only 40 curves! The full key is 11 22 33 44 55 66 77 88 in hexadecimal format
and the corresponding sub-keys at the first round are 24, 19, 8, 8, 5, 50, 43, 2 in
decimal representation.
SBox1SBox2SBox3SBox4SBox5SBox6SBox7SBox8
K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax
24 92% 19 90% 8 87% 8 88% 5 91% 50 92% 43 89% 2 89%
48 74% 18 77% 18 69% 44 67% 32 71% 25 71% 42 76% 28 77%
01 74% 57 70% 05 68% 49 67% 25 70% 05 70% 52 70% 61 76%
33 74% 02 70% 22 66% 02 66% 34 69% 54 70% 38 69% 41 72%
15 74% 12 68% 58 66% 29 66% 61 67% 29 69% 0 69% 37 70%
06 74% 13 67% 43 65% 37 65% 37 67% 53 67% 30 68% 15 69%
This table shows that the correct guess always stands out with a good contrast.
Therefore a sound decision can be made without any ambiguity despite a rough
estimation of ρmax.
A similar attack has also been conducted on a 32-bit implementation, in a
white box mode with a perfect knowledge of the implemented substitution tables
and the reference state which was 0. The key was 7C A1 10 45 4A 1A 6E 57 in
hexadecimal format and the related sub-keys at the 1st round were 28, 12, 43,
0, 15, 60, 5, 38 in decimal representation. The number of curves is 100. As next
table shows, the contrast is good between the correct and the most competing
wrong guess (around 40% on boxes 1 to 4). The correlation rate is not that high
on boxes 5 to 8, definitely because of partial and imperfect modeling, but it
proves to remain exploitable and thus a robust indicator. When the number of
bits per machine word is greater, the contrast between the guesses is relatively
enhanced, but finding the right model could be more difficult in a black box
mode.
SBox1SBox2SBox3SBox4SBox5SBox6SBox7SBox8
K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax K ρmax
28 77% 12 69% 43 73% 0 82% 15 52% 60 51% 5 51% 38 47%
19 36% 27 29% 40 43% 29 43% 03 33% 10 34% 15 40% 05 29%
42 35% 24 27% 36 35% 20 35% 58 30% 58 33% 6 29% 55 26%
61 31% 58 27% 06 33% 60 32% 10 30% 18 31% 12 29% 39 25%
8 Conclusion
Our experience on a large set of smart card chips over the last years has con-
vinced us on the validity of the Hamming distance model and the advantages of
the CPA method against DPA, in terms of efficiency, robustness and number of
experiments. An important and reassuring conclusion is that all the countermea-
sures designed against DPA offer the same defensive efficiency against the model
based CPA attack. This is not that surprising since those countermeasures aim
at undermining the common prerequisites that both approaches are based on:
side-channel observability and intermediate variable predictability.
The main drawback of CPA regards the characterization of the leakage model
parameters. As it is more demanding than DPA, the method may seem more
difficult to implement. However it may be objected that:
A statistical power analysis of any kind is never conducted blindly without
any preliminary reverse engineering (process identification, bit tracing): this
is the opportunity to quantify the leakage rate by CPA on known data.
DPA requires more sample curves anyway since all the unpredicted data bits
penalize the signal to noise ratio (see [5]).
If DPA fails by lack of implementation knowledge (increasing the number of
curves does not necessarily help), we have shown how to infer a part of this
information without excessive efforts: for instance the reference state is to
be found by exhaustive search only once in general.
There exists many situations where the implementation variants (like SBox
implementation in DES) are not so numerous because of operational con-
straints.
If part of the model cannot be inferred (SBox implementation in DES, hard-
ware co-processor), partial correlation with the remainder may still provide
exploitable indications.
Eventually DPA remains relevant in case of very special architectures for which
the model may be completely out of reach, like in certain hard wired co-processors.
References
1. D. Agrawal, B. Archambeault, J.R. Rao, and P. Rohatgi. The EM side channel(s):
Attacks and assessment methodologies. In Cryptographic Hardware and Embedded
Systems — CHES 2002, LNCS 2523, pp. 29–45, Springer-Verlag, 2002. See also
http://www.research.ibm.com.intsec/emf-paper.ps.
2. M.L. Akkar, R. B´evan, P. Dischamp, and D. Moyart. Power analysis, what is
now possible... In Advances in Cryptology — ASIACRYPT 2000, LNCS 1976, pp.
489–502, Springer-Verlag, 2000.
3. M.L. Akkar and C. Giraud. An Implementation of DES and AES secure against
some attacks. In Cryptographic Hardware and Embedded Systems — CHES 2001,
LNCS 2162 pp. 309–318, Springer-Verlag, 2001.
4. R. B´evan and R. Knudsen. Ways to enhance differential power analysis. In In-
formation Security and Cryptology — ICISC 2002, LNCS 2587, pp. 327–342,
Springer-Verlag, 2002.
5. E. Brier, C. Clavier, and F. Olivier. Optimal statistical power analysis.
http://eprint.iacr.org/2003/152/.
6. C. Clavier, J.-S. Coron, and N. Dabbous. Differential power analysis in the pres-
ence of hardware countermeasures. In Cryptographic Hardware and Embedded Sys-
tems — CHES 2000, LNCS 1965, pp. 252–263, Springer-Verlag, 2000.
7. J.-S. Coron and L. Goubin. On Boolean and arithmetic masking against differen-
tial power analysis. In Cryptographic Hardware and Embedded Systems — CHES
2000, LNCS 1965, pp. 231–237, Springer-Verlag, 2000.
8. J.-S. Coron, P. Kocher, and D. Naccache. Statistics and secret leakage. In Finan-
cial Cryptography (FC 2000), LNCS 1972, pp. 157–173, Springer-Verlag, 2001.
9. K. Gandolfi, C. Mourtel, and F. Olivier. Electromagnetic attacks: Concrete re-
sults. In Cryptographic Hardware and Embedded Systems — CHES 2001, LNCS
2162, pp. 252–261, Springer-Verlag, 2001.
10. J. Goli¸c and C. Tymen. Multiplicative masking and power analysis of AES. In
Cryptographic Hardware and Embedded Systems — CHES 2002, LNCS 2523, pp.
198–212, Springer-Verlag, 2002.
11. L. Goubin and J. Patarin. DES and differential power analysis. In Cryptographic
Hardware and Embedded Systems (CHES ’99), LNCS 1717, pp. 158–172, Springer-
Verlag, 1999.
12. P. Kocher, J. Jaffe, and B. Jun. Introduction to differential power analysis and
related attacks. http://www.cryptography.com.
13. P. Kocher, J. Jaffe, and B. Jun. Differential power analysis. In Advances in Cryp-
tology CRYPTO ’99, LNCS 1666, pp. 388–397, Springer-Verlag, 1999.
14. R. Mayer-Sommer. Smartly analysing the simplicity and the power of simple
power analysis on smartcards. In Cryptographic Hardware and Embedded Systems
— CHES 2000. LNCS 1965, pp. 78–92, Springer-Verlag, 2000.
15. T.S. Messerges. Using second-order power analysis to attack DPA resistant soft-
ware. In Cryptographic Hardware and Embedded Systems — CHES 2000. LNCS
1965, pp. 238–252, Springer-Verlag, 2000.
16. T. Messerges, E. Dabbish, and R. Sloan. Investigation of power analysis at-
tacks on smartcards. In Usenix Workshop on Smartcard Technology 1999.
http://www.usenix.org.
17. T. Messerges, E. Dabbish, and R. Sloan. Examining smart-card security under
the threat of power analysis attacks. IEEE Transactions on Computers, 51(5):
541–552, May 2002.
18. E. Oswald. On Side-Channel Attacks and the Application of Algorithmic Coun-
termeasures. PhD Thesis, Faculty of Science of the University of Technology Graz
(IAIK-TUG), Austria, May 2003.
19. A. Shamir. Protecting smart cards from passive power analysis with detached
power supplies. In Cryptographic Hardware and Embedded Systems — CHES
2000. LNCS 1965, pp. 71–77, Springer-Verlag, 2000.
20. E. Trichina, D. De Seta, and L. Germani. Simplified adaptive multiplicative mask-
ing for AES. In Cryptographic Hardware and Embedded Systems — CHES 2002,
LNCS 2523, pp. 187–197, Springer-Verlag, 2002.
... Among the winners, the lattice-based algorithms form the majority: Kyber [2], Dilithium [3] and Falcon [4]. Although the post-quantum algorithms can resist quantum computing attacks, special attention should be paid to SCA attacks [5], [6] when these algorithms are implemented in both hardware and software. ...
... More precisely, the multiplications c · s 1 and c · s 2 are targeted, where -among the outputs of the signaturethe challenge polynomial c ∈ R q is public and depends on the input message. The parameters (k, l) are chosen as {(4, 4), (6,5), (8,7)} with respect to the security level. η is chosen the same as in Kyber. ...
... Accordingly, we use X ′ = g(ŝ ′ i ,ĉ i ) to denote the evaluation of the chosen target function for the hypothesisŝ ′ i . In a Correlation Power Analysis (CPA) attack [6], the output of the target function is transformed into hypothetical leakages using a prediction function with HW function, W β (X ′ ), being the most frequently used prediction function. CPA is based on estimating the correlation, e.g. by Pearson correlation coefficient, between measured and hypothetical leakages, ρ(W β (X ′ ), L(X )). ...
Article
Full-text available
This paper questions the side-channel security of central reduction technique, which is widely adapted in efficient implementations of Lattice-Based Cryptography (LBC).We show that the central reduction leads to a vulnerability by creating a strong dependency between the power consumption and the sign of sensitive intermediate values. We exploit this dependency by introducing the novel absolute value prediction function, which can be employed in higher-order non-profiled multi-query Side-Channel Analysis (SCA) attacks. Our results reveal that – compared to classical reduction algorithms – employing the central reduction scheme leads to a two-orders-of-magnitude decrease in the number of required SCA measurements to exploit secrets of masked implementations. We particularly show that our approach is valid for the prime moduli employed by Kyber and Dilithium, the lattice-based post-quantum algorithms selected by NIST.We practically evaluate our introduced approach by performing second-order non-profiled attacks against an open-source masked implementation of Kyber on an ARM Cortex-M4 micro-processor. In our experiments, we revealed the full secret key of the aforementioned masked implementation with only 250 power traces without any forms of profiling or choosing the ciphertexts.
... In practice, SCA can pose a serious threat on the physical security of cryptographic chips. The earliest SCA method is timing attack [2], and then various classical SCA methods have been proposed, such as Differential Power Analysis (DPA) [3], Simple Power Analysis (SPA) [4], Template Attack (TA) [5], and Correlation Power Analysis (CPA) [6]. In order to improve attack efficiency, deep-learning side-channel attack [7] was proposed in 2016. ...
... In this work, Kocher et al. used the timing information of a cryptographic chip to recover the secret key. Subsequently, DPA [3], SPA [4], TA [5], and CPA [6] have been proposed. Among them, CPA has shown good performance and it is one of the widely used attack styles in practice. ...
Article
Full-text available
secAES v2 is a mask scheme for AES and it is implemented with affine and shuffle techniques. Existing attack against secAES uses template attack, which first locates multiplicative leakage points and establishes templates to recover the two byte random mask, and then recovers secret key with template attack. Template construction can be complex in practice, and locating leakage points can also be difficult. Therefore, template attack may not perform well if conditions are not satisfied. In order to optimize the attack efficiency against the secAES mask scheme, a two-stage side-channel attack is proposed. First, a dense connected network DenseNet-121 is established to learn the multiplicative leakage characteristics according to traces from secAES’s loadAndMaskInput function, and one byte multiplicative mask used can be recovered with a high probability. Second, based on the recovered multiplicative mask, key recovery is achieved using deep-learning side-channel attacks. The experimental results in simulation scenario show that compared with template attack, the proposed method does not need to find leakage points in the process of recovering one byte multiplicative mask. Furthermore, it is easier to implement, has lower computational complexity and has a higher success rate. Indeed, the DenseNet-121 network can be used to recover the multiplicative mask used by each plaintext block with a probability of 84%. During key recovery, based on the DenseNet-121 model, the required number of traces is reduced, only 4 traces are required to recover key bytes.
... Power Analysis: Analyzing the power consumption of a device during operations [25]. Electromagnetic Analysis: Exploiting electromagnetic emissions from the device [26]. ...
Article
Full-text available
In the field of cybersecurity, the ability to gather detailed information about target systems is a critical component of the reconnaissance phase of cyber attacks. This phase, known as cybersecurity reconnaissance, involves techniques that adversaries use to collect information vital for the success of subsequent attack stages. Traditionally, reconnaissance activities include network scanning, sniffing, and social engineering, which allow attackers to map the network, identify vulnerabilities, and plan their exploits. In this paper, we explore a novel application of side-channel analysis within the domain of system-based reconnaissance. Side-channel attacks, typically used to extract cryptographic keys or sensitive data through indirect observations such as power consumption or electromagnetic emissions, are here repurposed for a different kind of system intrusion. Specifically, we demonstrate how side-channel analysis and machine learning techniques can classify running processes on a target system very popular in common IoT applications. This method could enable to identify which applications are active without needing direct access to the system’s internal data. By categorizing this approach as a form of local system-based reconnaissance, we highlight its potential to silently gather critical information about a system’s state. Such capabilities represent a significant breach of privacy and provide attackers with the intelligence needed to carry out more targeted and effective attacks. This research also underscores the evolving nature of reconnaissance techniques and the growing risks of advanced side-channel cybersecurity methods.
... Among them, power analysis attacks, which acquire and analyze power at certain intervals, have become mainstream because they are low-cost and easy to perform. Power analysis attacks include simple power analysis (SPA) [17], differential fault analysis (DFA) [18], and correlation power analysis (CPA) [19], [20]. CPA are particularly threatening in the IoT field. ...
Article
Full-text available
In recent years, field programmable gate array (FPGA) have been used in many internet of things (IoT) devices and are equipped with cryptographic circuits to ensure security. However, they are exposed to the risk of cryptographic keys being stolen by side-channel attacks. Countermeasures against side-channel attacks have been developed, but they are becoming more of a threat to IoT devices due to the diversity of attacks. Therefore, it is necessary to understand the basic characteristics of side-channel attacks. Therefore, this study clarifies the relationship between two timing issues, the clock period of the circuit and the power sampling interval, and the amount of side-channel leakage. We design seven advanced encryption standard (AES) circuits with different clock periods and conduct empirical experiments using logic simulations to clarify the correlation between the two timings and the amount of side-channel leakage. T-test is used to evaluate the leakage amount, which is evaluated based on four metrics. From the results, we argue that the clock period and sampling interval do not interfere with each other in the side-channel leakage amount.
Article
Power side-channel attacks exploit the correlation of power consumption with the instructions and data being processed to extract secrets from a device (e.g., cryptographic keys). Prior work primarily focused on protecting small embedded micro-controllers and in-order processors rather than high-performance, out-of-order desktop and server CPUs. In this paper, we present Paradise , a general-purpose out-of-order processor with always-on protection, that implements a novel dynamic instruction scheduler to provide obfuscated execution and mitigate power analysis attacks. To achieve this, we exploit the time between operand availability of critical instructions ( slack ) and create high-performance random schedules. Further, we highlight the dangers of using incorrect adversarial assumptions, which can often lead to a false sense of security. Therefore, we perform an extended security analysis on AES-128 using different levels of adversaries, from basic to advanced, including a CNN-based attack. Our advanced security evaluation assumes a strong adversary with full knowledge of the countermeasure and demonstrates a significant security improvement of 556 × when combined with Boolean Masking over a baseline only protected by masking, and 62, 500 × over an unprotected baseline. The resulting overhead in performance, power and area of Paradise is 3.2%3.2\% , 1.2%1.2\% and 0.8%0.8\% respectively.
Chapter
A fundamental question in side-channel analysis is how to characterize the concrete security level of the target cryptographic implementation in the presence of side-channel leakage. This is firstly the concern of security evaluators since they usually cannot enumerate side-channel attacks exhaustively within a given time period and costs. Secondly, the secure designers usually wonder how much security a given cryptographic implementation (or components) can ensure in practice. Thirdly, from the adversary’s perspective, they also wonder what are the potential outputs that the best attack may provide. In all those aspects, information-theoretic measures and coding-theoretic tools can be utilized effectively in establishing fundamental and generic bounds on side-channel attacks. This chapter builds a formal connection between attacks and information-theoretic metrics. The main goal is to provide some generic information-theoretic bounds that apply to any side-channel attack. In particular, we present generic bounds based on mutual information and α\alpha -information for both unprotected and protected cryptographic devices. At last, we complete the information leakage quantification of CBM by providing attack-based evaluation results, which provide more explicit evidence that CBM is a promising approach in designing and constructing secure cryptographic implementations.
Conference Paper
A new kind of cryptanalytic attacks, targeted directly at the weaknesses of a cryptographic algorithm's physical implementation, has recently attracted great attention. Examples are timing, glitch, or power-analysis attacks. Whereas in so-called simple power analysis (SPA for short) only the power consumption of the device is analyzed, differential power analysis (DPA) additionally requires knowledge of ciphertext outputs and is thus more costly. Previous investigations have indicated that SPA is little threatening and moreover easy to prevent, leaving only DPA as a serious menace to smartcard integrity. We show, however, that with careful experimental technique, SPA allows for extracting sensitive information easily, requiring only a single power-consumption graph. This even holds with respect to basic instructions such as register moves, which have previously not been considered critical. Our results suggest that SPA is an effective and easily implementable attack and, due to its simplicity, potentially a more serious threat than DPA in many real applications.
Article
We present a systematic investigation of the leakage of compromising information via electro- magnetic (EM) emanations from chipcards and other devices. This information leakage differs substantially from and is more powerful than the leakage from other conventional side-channels such as timing and power. EM emanations are shown to consist of a multiplicity of compro- mising signals, each leaking somewhat different information. Our experimental results confirm that these signals could individually contain enough leakage to break cryptographic implementa- tions and to defeat countermeasures against other side-channels such as power. Using techniques from Signal Detection Theory, we also show that generalized and far more devastating attacks can be constructed from an effective pooling of leakages from multiple signals derived from EM emanations. The magnitude of EM exposure demands a leakage assessment methodology whose correctness can be rigorously proved. We define a model that completely and quantitatively bounds the information leaked from multiple (or all available) EM side-channel signals in CMOS devices and use that to develop a practical assessment methodology for devices such as chipcards.
Article
Since the announcement of the Dieren tial Power Analy- sis (DPA) by Paul Kocher and al., several countermeasures were pro- posed in order to protect software implementations of cryptographic al- gorithms. In an attempt to reduce the resulting memory and execution time overhead, Thomas Messerges recently proposed a general method that \masks" all the intermediate data. This masking strategy is possible if all the fundamental operations used in a given algorithm can be rewritten with masked input data, giving masked output data. This is easily seen to be the case in classical algo- rithms such as DES or RSA. However, for algorithms that combine Boolean and arithmetic functions, such as IDEA or several of the AES candidates, two dieren t kinds of masking have to be used. There is thus a need for a method to convert back and forth between Boolean masking and arithmetic masking. In the present paper, we show that the 'BooleanToArithmetic' algorithm proposed by T. Messerges is not sucien t to prevent Dieren tial Power Analysis. In a similar way, the 'ArithmeticToBoolean' algorithm is not secure either.
Conference Paper
Cryptosystem designers frequently assume that secrets will be manipulated in closed, reliable computing environments. Unfortunately, actual computers and microchips leak information about the operations they process. This paper examines specific methods for analyzing power consumption measurements to find secret keys from tamper resistant devices. We also discuss approaches for building cryptosystems that can operate securely in existing hardware that leaks information.Keywordsdifferential power analysisDPASPAcryptanalysisDES
Conference Paper
Since Power Analysis on smart-cards was introduced by Paul Kocher (KJJ98), the validity of the model used for smart-cards has not been given much attention. In this paper, we first describe and ana- lyze some different possible models. Then we apply these models to real components and clearly define what can be detected by power analysis (simple, differential, code reverse engineering...). We also study, from a statistical point of view, some new ideas to exploit these models to at- tack the card by power analysis. Finally we apply these ideas to set up real attacks on cryptographic algorithms or enhance existing ones.
Conference Paper
The silicon industry has lately been focusing on side chan- nel attacks, that is attacks that exploit information that leaks from the physical devices. Although dieren t countermeasures to thwart these at- tacks have been proposed and implemented in general, such protections do not make attacks infeasible, but increase the attacker's experimental (data acquisition) and computational (data processing) workload beyond reasonable limits. This paper examines dieren t ways to attack devices featuring random process interrupts and noisy power consumption.