Getting More from PCA: First Results of Using Principal Component Analysis for Extensive Power Analysis.
ABSTRACT Differential Power Analysis (DPA) is commonly used to obtain information about the secret key used in cryptographic devices. Countermeasures against DPA can cause power traces to be misaligned, which reduces the effectiveness of DPA. Principal Component Analysis (PCA) is a powerful tool, which is used in different research areas to identify trends in a data set. Principal Components are introduced to describe the relationships within the data. The largest principal components capture the data with the largest variance. These Principal Components can be used to reduce the noise in a data set or to transform the data set in terms of these components. We propose the use of Principal Component Analysis to improve the correlation for the correct key guess for DPA attacks on software DES traces and show that it can also be applied for other algorithms. We also introduce a new way of determining key candidates by calculating the absolute average value of the correlation traces after a DPA attack on a PCAtransformed trace. We conclude that Principal Component Analysis can successfully be used as a preprocessing technique to reduce the noise in a trace set and improve the correlation for the correct key guess using Differential Power Analysis attacks.

Conference Paper: Improving sidechannel analysis with optimal linear transforms
[Show abstract] [Hide abstract]
ABSTRACT: Preprocessing techniques are widely used to increase the success rate of sidechannel analysis when attacking (protected) implementations of cryptographic algorithms. However, as of today, the according steps are usually chosen heuristically. In this paper, we present an analytical expression for the correlation coefficient after applying a linear transform to the sidechannel traces. Doing so, we are able to precisely quantify the influence of a linear filter on the result of a correlation power analysis. On this basis, we demonstrate the use of optimisation algorithms to efficiently and methodically derive "optimal" filter coefficients in the sense that they maximise a given definition for the distinguishability of the correct key candidate. We verify the effectiveness of our methods by analysing both simulated and realworld traces for a hardware implementation of the AES.Proceedings of the 11th international conference on Smart Card Research and Advanced Applications; 11/2012  SourceAvailable from: Kleber Mariano
Article: Principal component analysis in the spectral analysis of the dynamic laser speckle patterns
Kleber Mariano Ribeiro, Roberto Alves Braga Júnior, Graham William Horgan, Danton Diego Ferreira, Thelma Sáfadi[Show abstract] [Hide abstract]
ABSTRACT: Dynamic laser speckle is a phenomenon that interprets an optical patterns formed by illuminating a surface under changes with coherent light. Therefore, the dynamic change of the speckle patterns caused by biological material is known as biospeckle. Usually, these patterns of optical interference evolving in time are analyzed by graphical or numerical methods, and the analysis in frequency domain has also been an option, however involving large computational requirements which demands new approaches to filter the images in time. Principal component analysis (PCA) works with the statistical decorrelation of data and it can be used as a data filtering. In this context, the present work evaluated the PCA technique to filter in time the data from the biospeckle images aiming the reduction of time computer consuming and improving the robustness of the filtering. It was used 64 images of biospeckle in time observed in a maize seed. The images were arranged in a data matrix and statistically uncorrelated by PCA technique, and the reconstructed signals were analyzed using the routine graphical and numerical methods to analyze the biospeckle. Results showed the potential of the PCA tool in filtering the dynamic laser speckle data, with the definition of markers of principal components related to the biological phenomena and with the advantage of fast computational processing.Journal of the European Optical Society Rapid Publications 02/2014; 9:14009. · 0.93 Impact Factor
Page 1
Getting More from PCA:
First Results of Using Principal Component
Analysis for Extensive Power Analysis
Lejla Batina1,2, Jip Hogenboom3,?, and Jasper G.J. van Woudenberg4
1Radboud University Nijmegen, ICIS/Digital Security group
Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
lejla@cs.ru.nl
2K.U.Leuven ESAT/SCDCOSIC and IBBT
Kasteelpark Arenberg 10, B3001 LeuvenHeverlee, Belgium
lejla.batina@esat.kuleuven.be
3KPMG Advisory N.V.
Laan van Langerhuize 1, 1186 DS Amstelveen, The Netherlands
Hogenboom.Jip@kpmg.nl
4Riscure BV
Delftechpark 49, 2628 XJ Delft, The Netherlands
vanwoudenberg@riscure.com
Abstract. Differential Power Analysis (DPA) is commonly used to ob
tain information about the secret key used in cryptographic devices.
Countermeasures against DPA can cause power traces to be misaligned,
which reduces the effectiveness of DPA. Principal Component Analysis
(PCA) is a powerful tool, which is used in different research areas to
identify trends in a data set. Principal Components are introduced to
describe the relationships within the data. The largest principal compo
nents capture the data with the largest variance. These Principal Com
ponents can be used to reduce the noise in a data set or to transform
the data set in terms of these components. We propose the use of Prin
cipal Component Analysis to improve the correlation for the correct key
guess for DPA attacks on software DES traces and show that it can also
be applied for other algorithms. We also introduce a new way of deter
mining key candidates by calculating the absolute average value of the
correlation traces after a DPA attack on a PCAtransformed trace. We
conclude that Principal Component Analysis can successfully be used as
a preprocessing technique to reduce the noise in a trace set and improve
the correlation for the correct key guess using Differential Power Analysis
attacks.
Keywords: Sidechannel cryptanalysis, DPA, countermeasures, PCA.
1Introduction
Sidechannel attacks are indirect methods which are used to find secret keys in
cryptographic devices.Onthesedevices,cryptographic algorithmsare
?This work was done when the author was with Radboud University Nijmegen.
O. Dunkelman (Ed.): CTRSA 2012, LNCS 7178, pp. 383–397, 2012.
c ? SpringerVerlag Berlin Heidelberg 2012
Page 2
384L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
implemented to ensure encrypted communication. Smart cards can sometimes
contain a software implementation of cryptographic algorithm but often they
also include a cryptographic coprocessor, where the larger devices usually have
dedicated hardware implementations. The secret keys used for the algorithms
are usually well protected within these devices.
A widely used method to recover secret keys is by using sidechannel infor
mation. Sidechannel attacks make use of leaked physical information, such as
power consumption, electromagnetic (EM) radiation etc. This information is
leaked because of weaknesses in the physical implementation of the algorithm.
An example of a widely used sidechannel attack is Differential Power Anal
ysis (DPA) [8]. The power consumption of a cryptographic device is dependent
on the data being processed and in particular on the secret key that is used for
encryption (decryption). This power consumption is measured while the secret
key is manipulated within a cryptographic device and the corresponding power
traces are collected. Performing DPA on the traces assumes doing some statis
tics on the power measurements and modeled traces. In this way, the attacker
is using a sidechannel distinguisher (e.g. DoM [8], Pearson correlation coeffi
cient [3], Mutual Information [5] etc.) on the actual traces and the predictions
for the measurements in order to test the hypothesis about (the part of) the
used cryptographic key.
To defend against sidechannel analysis, manufacturers of cryptographic de
vices usually implement countermeasures on their devices to complicate DPA
substantially. Common methods include masking the sensitive values of data i.e.
the variables depending on known data (e.g. plaintext) and the hypothesized key,
and hiding of the dependency of power on data [9] in specific time moments. The
latter can be obtained by various means e.g. random process interrupts (RPI) [4],
random process order, unstable clocks etc. For example, when RPI are used as a
countermeasure the position of the leakage that is exploited by DPA can shift a
few clock cycles. In this way, locating the specific time points, where the key is
processed is further obfuscated. Due to all those countermeasures preprocessing
power traces has become an important step in sidechannel analysis.
Principal Component Analysis (PCA) [7] is a technique which is widely used
to reduce the noise or the dimensionality in a data set, while retaining the most
variance. PCA results in a new ordered set of vectors that form an orthogonal
basis for a data set. Each basis vector, or Principal Component (PC), captures
the highest variance of all following PCs. PCA is used in many different domains
such as gene analysis and face recognition.
An interesting property of PCA in the context of trace set analysis is that
correlating samples in time are projected onto one or a few PCs. As timedomain
traces often have multiple samples where leakage is presented, we hypothesize
that these samples will be projected onto only a few PCs. This implies effective
filtering strategies that are possible when other PCs are filtered out, and addi
tionally CPA could be performed on PCA transformed traces. In this paper we
explore these ideas. We show several directions for the PCA tools to improve
sidechannel analysis in preprocessing as well as in the actual analysis.
Page 3
First Results of Using Principal Component Analysis385
We first use PCA to transform trace sets such that, even when leakage
of the key bits (through the sensitive variables) appears at different points in
time, the trace set can be still analyzed with DPA. As PCA transform can reduce
the dimension of the trace set, we also show how to transform the original data
to a new trace set in terms of the Principal Components. After applying this
transformation, the most variance within the data is included in the first part
(the first Principal Components) of the transformed trace set. This fact is also
used by [14], where the authors defined a new sidechannel distinguisher based
on the first principal component.
There were several attempts to deploy PCA in sidechannel cryptanalysis but
the full potentials of it are yet to be fully unleashed. First investigation was
performed by Bohy at al. [2]. They considered PCA as a method to improve
power attacks. However, their results cover the effects of PCA on SPA only,
while further studies extend to PCA on DPA and CPA.
Archambeau et al. [1] used PCA for template attacks. In this approach the
traces are first transformed by PCA in order to perform interest point selection.
Indeed, in the preprocessing phase the attacker builds templates in order to
complete the profiling phase by using a clone of the device under attacks. Then,
the templates are used to mount an attack on the real device. The top prin
cipal components are used to capture the maximum variance between different
template classes.
In contrast to [1] Souissi et al. [14] used PCA not as preprocessing tool but as
a common sidechannel distinguisher. The new distinguisher has the usual steps
of differential power analysis (DPA [8] or CPA [3]) that consists of computational
phase only and does not require an identical device for profiling.
Our Contribution. Our work is not considering the scenario of template at
tacks (that are assumed to be the strongest sidechannel attacks [15]) nor we
deploy PCA as yet another distinguisher. We introduce PCA as a suitable pre
processing technique on the power traces to enhance the results of DPA. The
two benefits of PCA we observe are noise reduction and a PCA transformation
(leading to more efficient DPA). Both were analyzed and several experiments
are performed on unprotected and implementations with countermeasures. Ad
ditionally, we investigate the suitability of PCA on misaligned traces where our
results show good results when compared to e.g. static alignment. We conclude
that PCA has many potentials in the field of sidechannel cryptanalysis and we
expect more research to evolve.
The remainder of this paper is organized as follows. Section 2 describes some
background on PCA and its applications. Our experiments with PCA related to
DPA are described in Sect. 3. We compare our results to some previous work in
Sect. 4. In Sect. 5 we conclude this work and discuss our findings.
Page 4
386 L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
2Principal Component Analysis
Principal Component Analysis (PCA) is a multivariate technique that is widely
used to reduce the noise or the dimensionality in a data set, while retaining the
most variance. The origin of PCA goes back more than 100 years ago to Pearson
[12] and also later a relevant formulation is due to Hotelling [6].
PCA computes a set of new orthogonal variables (by means of eigenvectors)
with the decreasing variances within the data set, producing Principal Compo
nents (PC). The largest variance is captured by the highest component. PCA is
used in many different domains such as gene analysis and face recognition.
When we consider PCA in terms of power measurements, we have a data set
where the dimensionality is equal to the number of samples and the number of
observations is equal to the number of traces. This means that the number of
Principal Components which can be deduced from a power trace is (at most)
equal to the number of samples.
The main drawback of PCA is that a covariance matrix of n ∗ n (where n
is the number of samples) must be calculated. This means that the calculation
time increases quadratically relative to the number of samples.
2.1Example
In order to illustrate the way PCA works, we give a small example for a two
dimensional (x,y) data set with 50 observations [7]. In Fig. 1 (left) a plot of this
data set is given. The first principal component is required to have the largest
variance. The second component must be orthogonal to the first component
while capturing the largest variance within the data set in that direction. These
components are plotted in Fig. 1. This results in components which are sorted by
variance, where the first component captures the largest variance. If we transform
the data set using these principal components, the plot given in Fig. 1 (right)
Fig.1. A plotted data set with both of its principal components (left) and plot of the
transformed data set with respect to the both principal components (right) [7]
Page 5
First Results of Using Principal Component Analysis387
will be obtained. This plot clearly shows that there is a larger variation in the
direction of the first principal component.
2.2PCA Transformation
In general PCA is used when trying to extract the most interesting informa
tion from data with large dimensions. More precisely, PCA attempts to find a
new representation of the original set by constructing a set of orthogonal vec
tors spanning a subspace of the initial space. The new variables that are linear
combinations of the starting ones are called principal components.
Power traces usually have large dimensions, and one would like to find the
information of the key leakage within them.
In order to calculate PCA, the following few steps have to be performed [13].
– First, the mean is computed as the average over all n dimensions (samples).
Mn=
n
?
i=1
Ti,n
n
where Ti,nmeans all traces are considered as ndimensional vectors.
This mean Mnis afterwards subtracted from each of the dimensions n for
each trace Ti.
Ti,n= Ti,n− Mn
– The covariance matrix Σ is constructed. A covariance matrix is a matrix
whose (i,j)th element is the covariance between the ith and jth dimension
of each trace. This matrix will be a n ∗ n matrix where n is equal to the
number of samples (dimension) of the power traces. This means that the
calculation time increases quadratically relative to the number of samples.
In general, the covariance for two ndimensional vectors X and Y is defined
as follows:
Cov(X,Y ) =
?n
i=1(Xi−¯
X)(Yi−¯Y )
n−1
Using the formula for the covariance, the covariance matrix is defined as
follows:
Σn∗n= (ci,j,ci,j= Cov(Dimi,Dimj))
Where Dimxis the xth dimension.
– Then the singular values decomposition (SVD) i.e. the eigenvectors of the
covariance matrix are calculated.
Σ = U ∗ Λ ∗ U−1
Page 6
388 L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Here Λ is diagonal matrix (with the eigenvalues on the diagonal) and U is an
orthogonal matrix of eigenvectors of Σ. These eigenvectors and eigenvalues
already provide information about the patterns in the data.
The eigenvector corresponding to the largest eigenvalue is called the first
principal component, this component corresponds to the direction with the
most variance. Since n eigenvectors can be derived, there are n principal
components. They are ordered from high to low based on their eigenvalue.
In this way the most relevant principal component are sorted first.
In order to reduce the dimension, we can optionally choose (first) p components,
and form a matrix with these vectors in the columns. This matrix is called the
feature vector. In the literature, there are several tests known helping in deciding
on the number of components to keep.
With these p feature vectors we have two choices. The original data can be
transformed to retain only p dimensions, or the noise of the original data set can
be reduced using some components while keeping all n dimensions.
2.3Assumptions and Properties of PCATransformed Data
When using this technique we are accepting some ground assumptions behind
PCA and correspondingly we have to carefully consider situations where the
assumptions are not (fully) valid.
Assumptions of PCA
– Linearity. This fact assumes that new vectors i.e. components are linear
combination of original ones, so we rely on the same concept for leakage.
– Components with large variances are the most interesting ones. We show in
Sect. 3 that this is not always valid.
– The reduction of the dimension of the original data set does not lead to the
loss of important information. On the contrary, this can lead to better results
e.g. when noise is removed to improve the key recovery.
As timedomain traces often have multiple samples where leakage is presented,
we hypothesize that these samples will be projected onto only a few PCs. This
implies that effective filtering strategies are possible when other PCs are filtered
out, and additionally CPA could be performed on PCA transformed traces. We
investigate this in more details in the remainder of this paper.
2.4 Multiple Leakage Points and PCA
As mentioned above PCA computes new variables i.e. principal components
which are derived as linear combinations of the original variables. As a conse
quence, in the context of power analysis we have the following observation. An
interesting property of PCA in the context of trace set analysis is that corre
lating samples in time are projected onto one or a few PCs. For instance, a PC
which has two positive peaks at time t0 and t1 implies that the original trace
Page 7
First Results of Using Principal Component Analysis389
set has positively correlating values at time t0and t1. Also, the larger the PC
is, the larger the variance at these times is.
To show these properties, we analyze applying PCA transformations to some
simulated power traces. We first create a noisy trace set with three points of
leakage: a point A with leakage, a noncorrelating point B with leakage, and a
point C with leakage negatively correlating with peak A. Each trace also has
some low noise added. Figure 2 shows twenty overlapped traces from this set.
We also create a similar set with misaligned peaks (Figure 3). For both trace
sets we calculate the principle components, as shown in Figure 4a and 4b.
First, it is clear that the first principle component captures the correlation
between the peaks A and C, and the second captures peak B. This also implies
that all samples in peak A and C (and similarly peak B) are accumulated into
one dimension after the PCA transformation. This can potentially increase DPA
leakage, which is calculated per dimension. It is interesting this holds for both
aligned and misaligned traces, which also shows that a transformation could
project misaligned peaks onto a few dimensions.
Second, in the aligned case, the first two principle components capture all
peak information. The other principle components all capture noise. For the
misaligned case, the other principle components represent different shifted peaks.
These experiments show that, even under misalignment, PCA transformations
can project multiple correlating points of leakage onto several PCs. As DPA on
a transformed traceset analyzes PCs separately, this may improve the signalto
noise ratio.
Fig.2. Aligned traces with three leakage points
Fig.3. Misaligned traces with three leakage points
Page 8
390L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
(a) Aligned peaks.(b) Misaligned peaks.
Fig.4. First four Principle Components of transformed trace sets
2.5Noise Reduction
Due to various countermeasures that aim at making DPA more difficult by e.g.
adding a lot of noise, it is sometimes required to perform a lot of preprocessing
to remove the noise for successful key recovery. In particular, one can use chosen
Principal Components to retain only certain (sensitive) information. One of the
assumptions behind PCA (cf. Sect. 2) is that PC with larger variances represent
interesting data, while others with lower variances represent noise. Therefore,
the goal is to remove the components which contribute to the noise. The first
step is the same as in a transformation, the feature vector U is transposed and
multiplied with the transposed meanadjusted data X.
Y = UT∗ XT= (X ∗ U)T
Hence, when extraction the undesired i.e. noiserepresenting components is per
formed, the procedure is as follows. The dimensionality of the data is reduced
to p by projecting each x in X into y = UT
of U, and y is a pvector. The PCA approximation (of the input x) with only p
principal components is then;
p∗x where Upare the first p columns
˜ x =
p
?
j=1
(uT
j∗ x) ∗ uj
and the (squared reconstruction) error can be shown to be equal to?n
Choosing the Right Components to Keep. There are extensive discus
sions in the literature about the choice of components to keep in order to get
the maximum from using PCA. The ideas and approaches depend heavily on
applications. Since this method is mostly used to find the most distinctive data
i=p+1λi
that is, the sum of the eigenvalues for the unused eigenvectors.
Page 9
First Results of Using Principal Component Analysis391
(which usually is the data with the most variance), most of the literature deals
with deciding about the amount of the “smaller” components that can be left
out.
For sidechannel analysis, this is not the right route to take. Usually, power
traces contain a lot of noise and this noise typically has a large variation relative
to the DPA information we are looking to find. This means that, depending on
the process of collecting the power measurements to be analyzed, the noise can
also be captured by the largest principal components, especially for “real” trace
sets i.e. the one where countermeasures are deployed. Since we would like to
get rid of the noise, we need to find which principal components we can safely
remove without losing the data relating to the secret key. We address this issue
in our experiments.
3 Experiments
As described in Section 2, PCA can be used in two ways. We can perform a
transformation using PCA and do the analysis on Principal Components, or
we could use only a subset of Principal Components to reduce the noise in the
original trace set. In this section we address both aspects.
We performed our experiments by taking power measurements of a smartcard
which contains a software DES implementation. In order to test PCA against
countermeasures, we used an implementation that contains a configurable coun
termeasure that introduces random delays. We used a Picoscope 5203 and a
modified smart card reader to obtain power measurements from the used smart
card. In order to enhance the signal, we used an analog 48 MHz lowpass filter.
We performed all experiments also on a hardware DES implementation and on
implementations of other cryptographic algorithms i.e. AES and ECC. These
experiments showed the same results as described in Section 3, which means
that the method is not implementation or algorithm dependent.
3.1 Noise Reduction
We know that the processrelated signal within the trace set has a large varia
tion, so it should be captured by the largest Principal Components. Any noise
in the measurement is captured by smaller PCs. However, this general observa
tion is not directly applicable to power consumption signals. More precisely, the
exact positions of keyrelated information differ for various implementations and
platforms.
Nevertheless, it is valuable to find out where the noiserelated information
is located, either as a result of some countermeasures or due to measurement
setups that are used. If we remove these principal components and retain all
others (the ones which contain the key information), we might be able to reduce
the noise i.e. to improve the signal to noise ratio, and therefore enhance the DPA
analysis.
We tested this hypothesis on our set of power measurements of a software
DES implementation. We took 200 traces of 312 samples of the DES encryption.
Page 10
392L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
All countermeasures were turned off, which meant that we already could find
the correct key using a CPA attack. Also, this meant that we knew in which
sample the key leakage was present.
We used this trace set for the rest of our experiments with noise reduction
using PCA. We plotted the principal components to see if they contained any
interesting properties. It appeared that different principal components captured
more or less information for the known key leakage samples. As an example, we
inspected the 15th principal component. It appeared to have a high peak for the
sample with the key leakage of SBox 8 so it contained some information about
the data at that sample location. We tested this hypothesis by performing a
noise reduction retaining all principal components except this 15th. When we
performed a DPA attack on the resulting trace set, the correlation for the correct
key guess for Sbox 8 dropped significantly.
Since we expect the largest amount of nonkey information to be captured by
the largest principal components, we remove some of these largest components.
When we perform a DPA attack on the resulting noisereduced trace set, we
can see an increase in the correlation value for the correct key guess. From this
observation, we can conclude that we removed more noise than signal.
We observed that different components capture the DPA information from
different Sboxes. This means that the best results are obtained if one knows
which component captures the most variation for the sample where the key
leakage is. Subsequently this means that for the best results, one needs to know
the sample with the key leakage. This implies that one should obtain a card of the
same type with a known key to find at which moment in time the information is
exploitable. In this way, a sort of profiling is performed i.e. templates are created
in order to speedup the key recovery.
Another useful observation is on software versus hardware implementations.
Our findings prove hardware measurements obtained from a card with a co
processor more “noisy” and best results were obtained by removing up to the
first 50 components. The exception was a set of measurements obtained from
SASEBOR board where the highest keydependent leakage was observed within
the 3rd principal component. As a conclusion, there is a lot of potential in PCA
for noise reduction, but the threshold for improving the leakage has to be decided
on the basis of a given implementation. Nevertheless, we were able to improve
the leakage in all observed cases.
3.2PCA Transformation
Whereas during noise reduction we first transform the trace set, remove some
components and then transform the trace set back for further analysis, we could
also only perform the transformation. This will put the Principal Components
on the main axis, which means that all variance that is correlated at different
points in time will be projected onto a few PCs as elaborated above.
CPA HighestPeak Distinguisher. We used the same trace set as before
containing 200 traces with 312 samples. To see which effect a transformation has
Page 11
First Results of Using Principal Component Analysis393
on the results of a DPA attack, we performed a DPA attack on this transformed
trace set. We found that the correct key guess did not contain the highest peak
in the correlation graph. This means that we are not able to find the correct key
after a PCA transformation.
CPA AbsAvg Distinguisher. However, when we inspected the correlation
graphs for all key guesses, we found that the graph for the correct key guess was
different from the graphs for the wrong key guesses, see Fig. 5. The correlation
for the first samples (which correspond to the higher principal components)
was higher for the correct key guess compared to the wrong key guess. The
main difference is however that the correlation for the lower samples was much
lower for the correct key guess compared to the wrong key guesses. Actually, the
conclusion is that variances are not the same for all PCs, which we expect when
the right key is used. This is in line with the results of a normal DPA attack
where the correlation for unrelated samples can also be lower for the correct key
guess compared to the wrong key guess [10].
Fig.5. Correlation trace for the wrong (upper) and the correct (lower) subkey guess
In order to quantify this, we add the absolute values of all samples for all
correlation traces x and divide this result by the total number of samples n in
order to create an average value avg.
avgx=
n
?
i=1
xi
n
Where xidenotes the value of the sample at index i in correlation trace x.
We use this method to calculate the absolute average value of each correlation
trace for all samples. A plot of these values is shown in Fig. 6. The xaxis shows
the key guesses for each Sbox i.e. the first 64 values correspond to the 64 subkeys
of Sbox 1 etc.
Page 12
394L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Fig.6. Absolute average value of each correlation trace for software DES
The main thing we notice in this graph is the 8 outlying peaks at different
locations. A peak means that the absolute average value of one correlation trace
is lower than the value of the other correlation traces. So we can basically say
that the correlation for that key guess is lower than the correlation of the other
key guesses. Since DES has 8 subkeys we can easily distinguish these and derive
the used secret key. We have found similar results for a hardware DES and an
FPGAimplementation of AES256.
3.3PCA on Misaligned Traces
An effective countermeasure against DPA attacks is the introduction of random
delays during execution of the algorithm. This decreases the effectiveness of DPA
as compared to a normal execution since the Sboxes are processed at different
moments in time.
Performing DPA on PCAtransformed traces however, is not as sensitive to
timing. Misalignment creates, in essence, a correlation between different points
in time. This may result in these samples being projected onto a few PCs, and
thereby reduce the effect of the random delay countermeasure.
In order to test this hypothesis, we perform a PCA transformation on an
obtained trace set of 500 traces with 2081 samples of a smartcard performing
software DES with a random delay countermeasure.
We first perform a PCA transformation of the traces. In component 41–57 we
find patterns that are interesting; see Fig. 7.
To investigate this further, we perform a DPA attack on the PCA transformed
traces and obtain the correlation traces for each key guess. From this we find
some peaks at the right key candidate for PC 46, which is very similar to 41. It
thereby appears that these components encode and gather the misaligned key
information.
When we calculate the absolute average value for each of the obtained cor
relation traces we get the graph as shown in Fig. 8 (Please note that due to
computational issues, we were only able to keep the DPA information for the
first five Sboxes.)
From this experiment we find that our hypothesis that misalignment causes a
correlation between the neighboring samples that have key leakage, and there
fore they are projected onto a few components. After using the absoluteaverage
distinguisher, we are able to fully extract the key of the software DES imple
mentation with random delays.
Page 13
First Results of Using Principal Component Analysis395
Fig.7. The 40th and the 41 principal component of a PCA transformed traceset with
random delays
Fig.8. Absolute average value of each correlation trace for software DES
4Comparison to Other Alignment Techniques
There are several other algorithms proposed to handle the misalignment coun
termeasure e.g. [9,16,11] and in this Section we compare PCA with one of them
i.e. with Static alignment.
Static alignment is the most natural method for the treatment of misaligned
traces and it is clearly described in [9] by Mangard et al. To apply the algorithm,
it is first required to choose a fragment in a socalled reference trace, which should
be ideally close to the attacking interval. Then the algorithm searches for the
same fragment in the other traces and shifts them accordingly. In this way the
alignment of the reference fragment is performed. The main disadvantages of
this method is in somewhat reduced efficiency, when compared to more recent
algorithms but it does improve on the number of traces required for a successful
DPA attack.
We compare static alignment with PCA on the height of the peak for the
correct key guess. For both methods, we compare the difference in the height
of the peak i.e. the correlation values for the correct key guess and for the first
wrong key guess. For PCA, we actually look at the difference in the height of the
average value of the correlation trace. To derive the values for static alignment,
we use the misaligned trace set from our sample card and statically align them
before doing a DPA attack. For PCA, we use the same (misaligned) trace set, to
which we first perform a PCA transformation, and afterwards we calculate the
Page 14
396L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Table 1. Comparison between Static alignment and PCA
Static alignment PCA
0.4035
0.2869
28.9%
Correct key guess
First wrong key guess
Difference
0.0450
0.0393
12.7%
absolute average value for 150 samples of the correlation traces. The results can
be found in Table 1.
We see that PCA does not outperform static alignment, at least for the chosen
trace set. However, the results should be considered less strictly as the method
used for PCA differs from the one for static alignment i.e. actual correlation val
ues versus absolute average values. Nevertheless, it proves PCA a viable method
for alignment along with preprocessing. As future work, we plan to perform a
meaningful comparison with other, recently published alignment methods such
as Elastic alignment [16] and RAM [11].
5 Conclusions
In this work we introduce Principal Component Analysis as a suitable preprocess
ing technique on the power traces to enhance the effectiveness of DPA attacks.
In particular, we advocate two separate cases to use PCA, for noise reduction
and a PCA transformation (before the actual DPA). Our results are verified in
practice by several experiments on both, protected and unprotected implemen
tations. In the experiments we were able to improve the signal to noise ratio in
various occasions when the location of the key leakage is known. We were able
to denoise a given trace set by retaining only the Principal Components which
capture the variance at the location of the key leakage. The effect of this noise
reduction was that the guessed, correct subkeys had a higher correlation when
a DPA attack was performed on the noisereduced trace set as opposed to the
correlation on the original trace set. This method works for each of the trace
sets we used.
Acknowledgements. We would like to thank Riscure for providing an envi
ronment for fruitful discussion during the research, and for providing the side
channel analysis platform that was used for this work (Inspector). We also thank
Yang Li and Kazuo Sakiyama from University of Electro Communication, Tokyo
for providing us with suitable traces from a SASEBO board. We thank Elena
Marchiori from RU Nijmegen for her insightful comments.
This work was supported in part by the IAP Programme P6/26 BCRYPT of
the Belgian State and by the European Commission under contract number ICT
2007216676 ECRYPT NoE phase II and by the K.U.LeuvenBOF (OT/06/40).
Page 15
First Results of Using Principal Component Analysis397
References
1. Archambeau, C., Peeters, E., Standaert, F.X., Quisquater, J.J.: Template At
tacks in Principal Subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 1–14. Springer, Heidelberg (2006)
2. Bohy, L., Neve, M., Samyde, D., Quisquater, J.J.: Principal and independent com
ponent analysis for cryptosystems with hardware unmasked units. In: Proceedings
of eSmart 2003 (2003)
3. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage Model.
In: Joye, M., Quisquater, J.J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29.
Springer, Heidelberg (2004)
4. Clavier, C., Coron, J.S., Dabbous, N.: Differential Power Analysis in the Presence
of Hardware Countermeasures. In: Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS,
vol. 1965, pp. 252–263. Springer, Heidelberg (2000)
5. Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual Information Analysis  A
Generic SideChannel Distinguisher. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008.
LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008)
6. Hotelling, H.: Analysis of a complex of statistical variables into principal compo
nents. The Journal of Educational Psychology, 417–441 (1933)
7. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics.
Springer, New York (2002)
8. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
9. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards (Advances in Information Security). SpringerVerlag New York,
Inc., Secaucus (2007)
10. Messerges, T.S.: Power analysis attacks and countermeasures for cryptographic
algorithms. PhD thesis, University of Illinois at Chicago, Chicago, IL, USA (2000)
11. Muijrers, R.A., van Woudenberg, J.G.J., Batina, L.: RAM: Rapid Alignment
Method. In: Prouff, E. (ed.) CARDIS 2011. LNCS, vol. 7079, pp. 266–282. Springer,
Heidelberg (2011)
12. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philo
sophical Magazine Series 2(6), 559–572 (1901)
13. Smith, L.I.: A tutorial on principal components analysis (February 2002), http://
www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
14. Souissi, Y., Nassar, M., Guilley, S., Danger, J.L., Flament, F.: First Principal
Components Analysis: A New Side Channel Distinguisher. In: Rhee, K.H., Nyang,
D. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 407–419. Springer, Heidelberg (2011)
15. Standaert, F.X., Malkin, T.G., Yung, M.: A Unified Framework for the Analysis
of SideChannel Key Recovery Attacks. In: Joux, A. (ed.) EUROCRYPT 2009.
LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009)
16. van Woudenberg, J.G.J., Witteman, M.F., Bakker, B.: Improving Differential
Power Analysis by Elastic Alignment. In: Kiayias, A. (ed.) CTRSA 2011. LNCS,
vol. 6558, pp. 104–119. Springer, Heidelberg (2011)
View other sources
Hide other sources
 Available from Lejla Batina · May 22, 2014
 Available from riscure.com