Getting More from PCA: First Results of Using Principal Component Analysis for Extensive Power Analysis.
ABSTRACT Differential Power Analysis (DPA) is commonly used to obtain information about the secret key used in cryptographic devices. Countermeasures against DPA can cause power traces to be misaligned, which reduces the effectiveness of DPA. Principal Component Analysis (PCA) is a powerful tool, which is used in different research areas to identify trends in a data set. Principal Components are introduced to describe the relationships within the data. The largest principal components capture the data with the largest variance. These Principal Components can be used to reduce the noise in a data set or to transform the data set in terms of these components. We propose the use of Principal Component Analysis to improve the correlation for the correct key guess for DPA attacks on software DES traces and show that it can also be applied for other algorithms. We also introduce a new way of determining key candidates by calculating the absolute average value of the correlation traces after a DPA attack on a PCA-transformed trace. We conclude that Principal Component Analysis can successfully be used as a preprocessing technique to reduce the noise in a trace set and improve the correlation for the correct key guess using Differential Power Analysis attacks.
- SourceAvailable from: David Oswald
Conference Paper: Improving side-channel analysis with optimal linear transforms[Show abstract] [Hide abstract]
ABSTRACT: Pre-processing techniques are widely used to increase the success rate of side-channel analysis when attacking (protected) implementations of cryptographic algorithms. However, as of today, the according steps are usually chosen heuristically. In this paper, we present an analytical expression for the correlation coefficient after applying a linear transform to the side-channel traces. Doing so, we are able to precisely quantify the influence of a linear filter on the result of a correlation power analysis. On this basis, we demonstrate the use of optimisation algorithms to efficiently and methodically derive "optimal" filter coefficients in the sense that they maximise a given definition for the distinguishability of the correct key candidate. We verify the effectiveness of our methods by analysing both simulated and real-world traces for a hardware implementation of the AES.Proceedings of the 11th international conference on Smart Card Research and Advanced Applications; 11/2012
- [Show abstract] [Hide abstract]
ABSTRACT: Dynamic laser speckle is a phenomenon that interprets an optical patterns formed by illuminating a surface under changes with coherent light. Therefore, the dynamic change of the speckle patterns caused by biological material is known as biospeckle. Usually, these patterns of optical interference evolving in time are analyzed by graphical or numerical methods, and the analysis in frequency domain has also been an option, however involving large computational requirements which demands new approaches to filter the images in time. Principal component analysis (PCA) works with the statistical decorrelation of data and it can be used as a data filtering. In this context, the present work evaluated the PCA technique to filter in time the data from the biospeckle images aiming the reduction of time computer consuming and improving the robustness of the filtering. It was used 64 images of biospeckle in time observed in a maize seed. The images were arranged in a data matrix and statistically uncorrelated by PCA technique, and the reconstructed signals were analyzed using the routine graphical and numerical methods to analyze the biospeckle. Results showed the potential of the PCA tool in filtering the dynamic laser speckle data, with the definition of markers of principal components related to the biological phenomena and with the advantage of fast computational processing.Journal of the European Optical Society Rapid Publications 02/2014; 9:14009. · 0.93 Impact Factor
Getting More from PCA:
First Results of Using Principal Component
Analysis for Extensive Power Analysis
Lejla Batina1,2, Jip Hogenboom3,?, and Jasper G.J. van Woudenberg4
1Radboud University Nijmegen, ICIS/Digital Security group
Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
2K.U.Leuven ESAT/SCD-COSIC and IBBT
Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium
3KPMG Advisory N.V.
Laan van Langerhuize 1, 1186 DS Amstelveen, The Netherlands
Delftechpark 49, 2628 XJ Delft, The Netherlands
Abstract. Differential Power Analysis (DPA) is commonly used to ob-
tain information about the secret key used in cryptographic devices.
Countermeasures against DPA can cause power traces to be misaligned,
which reduces the effectiveness of DPA. Principal Component Analysis
(PCA) is a powerful tool, which is used in different research areas to
identify trends in a data set. Principal Components are introduced to
describe the relationships within the data. The largest principal compo-
nents capture the data with the largest variance. These Principal Com-
ponents can be used to reduce the noise in a data set or to transform
the data set in terms of these components. We propose the use of Prin-
cipal Component Analysis to improve the correlation for the correct key
guess for DPA attacks on software DES traces and show that it can also
be applied for other algorithms. We also introduce a new way of deter-
mining key candidates by calculating the absolute average value of the
correlation traces after a DPA attack on a PCA-transformed trace. We
conclude that Principal Component Analysis can successfully be used as
a preprocessing technique to reduce the noise in a trace set and improve
the correlation for the correct key guess using Differential Power Analysis
Keywords: Side-channel cryptanalysis, DPA, countermeasures, PCA.
Side-channel attacks are indirect methods which are used to find secret keys in
cryptographic devices.Onthesedevices,cryptographic algorithmsare
?This work was done when the author was with Radboud University Nijmegen.
O. Dunkelman (Ed.): CT-RSA 2012, LNCS 7178, pp. 383–397, 2012.
c ? Springer-Verlag Berlin Heidelberg 2012
384L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
implemented to ensure encrypted communication. Smart cards can sometimes
contain a software implementation of cryptographic algorithm but often they
also include a cryptographic co-processor, where the larger devices usually have
dedicated hardware implementations. The secret keys used for the algorithms
are usually well protected within these devices.
A widely used method to recover secret keys is by using side-channel infor-
mation. Side-channel attacks make use of leaked physical information, such as
power consumption, electromagnetic (EM) radiation etc. This information is
leaked because of weaknesses in the physical implementation of the algorithm.
An example of a widely used side-channel attack is Differential Power Anal-
ysis (DPA) . The power consumption of a cryptographic device is dependent
on the data being processed and in particular on the secret key that is used for
encryption (decryption). This power consumption is measured while the secret
key is manipulated within a cryptographic device and the corresponding power
traces are collected. Performing DPA on the traces assumes doing some statis-
tics on the power measurements and modeled traces. In this way, the attacker
is using a side-channel distinguisher (e.g. DoM , Pearson correlation coeffi-
cient , Mutual Information  etc.) on the actual traces and the predictions
for the measurements in order to test the hypothesis about (the part of) the
used cryptographic key.
To defend against side-channel analysis, manufacturers of cryptographic de-
vices usually implement countermeasures on their devices to complicate DPA
substantially. Common methods include masking the sensitive values of data i.e.
the variables depending on known data (e.g. plaintext) and the hypothesized key,
and hiding of the dependency of power on data  in specific time moments. The
latter can be obtained by various means e.g. random process interrupts (RPI) ,
random process order, unstable clocks etc. For example, when RPI are used as a
countermeasure the position of the leakage that is exploited by DPA can shift a
few clock cycles. In this way, locating the specific time points, where the key is
processed is further obfuscated. Due to all those countermeasures pre-processing
power traces has become an important step in side-channel analysis.
Principal Component Analysis (PCA)  is a technique which is widely used
to reduce the noise or the dimensionality in a data set, while retaining the most
variance. PCA results in a new ordered set of vectors that form an orthogonal
basis for a data set. Each basis vector, or Principal Component (PC), captures
the highest variance of all following PCs. PCA is used in many different domains
such as gene analysis and face recognition.
An interesting property of PCA in the context of trace set analysis is that
correlating samples in time are projected onto one or a few PCs. As time-domain
traces often have multiple samples where leakage is presented, we hypothesize
that these samples will be projected onto only a few PCs. This implies effective
filtering strategies that are possible when other PCs are filtered out, and addi-
tionally CPA could be performed on PCA transformed traces. In this paper we
explore these ideas. We show several directions for the PCA tools to improve
side-channel analysis in pre-processing as well as in the actual analysis.
First Results of Using Principal Component Analysis385
We first use PCA to transform trace sets such that, even when leakage
of the key bits (through the sensitive variables) appears at different points in
time, the trace set can be still analyzed with DPA. As PCA transform can reduce
the dimension of the trace set, we also show how to transform the original data
to a new trace set in terms of the Principal Components. After applying this
transformation, the most variance within the data is included in the first part
(the first Principal Components) of the transformed trace set. This fact is also
used by , where the authors defined a new side-channel distinguisher based
on the first principal component.
There were several attempts to deploy PCA in side-channel cryptanalysis but
the full potentials of it are yet to be fully unleashed. First investigation was
performed by Bohy at al. . They considered PCA as a method to improve
power attacks. However, their results cover the effects of PCA on SPA only,
while further studies extend to PCA on DPA and CPA.
Archambeau et al.  used PCA for template attacks. In this approach the
traces are first transformed by PCA in order to perform interest point selection.
Indeed, in the pre-processing phase the attacker builds templates in order to
complete the profiling phase by using a clone of the device under attacks. Then,
the templates are used to mount an attack on the real device. The top prin-
cipal components are used to capture the maximum variance between different
In contrast to  Souissi et al.  used PCA not as pre-processing tool but as
a common side-channel distinguisher. The new distinguisher has the usual steps
of differential power analysis (DPA  or CPA ) that consists of computational
phase only and does not require an identical device for profiling.
Our Contribution. Our work is not considering the scenario of template at-
tacks (that are assumed to be the strongest side-channel attacks ) nor we
deploy PCA as yet another distinguisher. We introduce PCA as a suitable pre-
processing technique on the power traces to enhance the results of DPA. The
two benefits of PCA we observe are noise reduction and a PCA transformation
(leading to more efficient DPA). Both were analyzed and several experiments
are performed on unprotected and implementations with countermeasures. Ad-
ditionally, we investigate the suitability of PCA on misaligned traces where our
results show good results when compared to e.g. static alignment. We conclude
that PCA has many potentials in the field of side-channel cryptanalysis and we
expect more research to evolve.
The remainder of this paper is organized as follows. Section 2 describes some
background on PCA and its applications. Our experiments with PCA related to
DPA are described in Sect. 3. We compare our results to some previous work in
Sect. 4. In Sect. 5 we conclude this work and discuss our findings.
386 L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
2Principal Component Analysis
Principal Component Analysis (PCA) is a multivariate technique that is widely
used to reduce the noise or the dimensionality in a data set, while retaining the
most variance. The origin of PCA goes back more than 100 years ago to Pearson
 and also later a relevant formulation is due to Hotelling .
PCA computes a set of new orthogonal variables (by means of eigenvectors)
with the decreasing variances within the data set, producing Principal Compo-
nents (PC). The largest variance is captured by the highest component. PCA is
used in many different domains such as gene analysis and face recognition.
When we consider PCA in terms of power measurements, we have a data set
where the dimensionality is equal to the number of samples and the number of
observations is equal to the number of traces. This means that the number of
Principal Components which can be deduced from a power trace is (at most)
equal to the number of samples.
The main drawback of PCA is that a covariance matrix of n ∗ n (where n
is the number of samples) must be calculated. This means that the calculation
time increases quadratically relative to the number of samples.
In order to illustrate the way PCA works, we give a small example for a two-
dimensional (x,y) data set with 50 observations . In Fig. 1 (left) a plot of this
data set is given. The first principal component is required to have the largest
variance. The second component must be orthogonal to the first component
while capturing the largest variance within the data set in that direction. These
components are plotted in Fig. 1. This results in components which are sorted by
variance, where the first component captures the largest variance. If we transform
the data set using these principal components, the plot given in Fig. 1 (right)
Fig.1. A plotted data set with both of its principal components (left) and plot of the
transformed data set with respect to the both principal components (right) 
First Results of Using Principal Component Analysis387
will be obtained. This plot clearly shows that there is a larger variation in the
direction of the first principal component.
In general PCA is used when trying to extract the most interesting informa-
tion from data with large dimensions. More precisely, PCA attempts to find a
new representation of the original set by constructing a set of orthogonal vec-
tors spanning a subspace of the initial space. The new variables that are linear
combinations of the starting ones are called principal components.
Power traces usually have large dimensions, and one would like to find the
information of the key leakage within them.
In order to calculate PCA, the following few steps have to be performed .
– First, the mean is computed as the average over all n dimensions (samples).
where Ti,nmeans all traces are considered as n-dimensional vectors.
This mean Mnis afterwards subtracted from each of the dimensions n for
each trace Ti.
Ti,n= Ti,n− Mn
– The covariance matrix Σ is constructed. A covariance matrix is a matrix
whose (i,j)th element is the covariance between the ith and jth dimension
of each trace. This matrix will be a n ∗ n matrix where n is equal to the
number of samples (dimension) of the power traces. This means that the
calculation time increases quadratically relative to the number of samples.
In general, the covariance for two n-dimensional vectors X and Y is defined
Cov(X,Y ) =
Using the formula for the covariance, the covariance matrix is defined as
Σn∗n= (ci,j,ci,j= Cov(Dimi,Dimj))
Where Dimxis the xth dimension.
– Then the singular values decomposition (SVD) i.e. the eigenvectors of the
covariance matrix are calculated.
Σ = U ∗ Λ ∗ U−1
388 L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Here Λ is diagonal matrix (with the eigenvalues on the diagonal) and U is an
orthogonal matrix of eigenvectors of Σ. These eigenvectors and eigenvalues
already provide information about the patterns in the data.
The eigenvector corresponding to the largest eigenvalue is called the first
principal component, this component corresponds to the direction with the
most variance. Since n eigenvectors can be derived, there are n principal
components. They are ordered from high to low based on their eigenvalue.
In this way the most relevant principal component are sorted first.
In order to reduce the dimension, we can optionally choose (first) p components,
and form a matrix with these vectors in the columns. This matrix is called the
feature vector. In the literature, there are several tests known helping in deciding
on the number of components to keep.
With these p feature vectors we have two choices. The original data can be
transformed to retain only p dimensions, or the noise of the original data set can
be reduced using some components while keeping all n dimensions.
2.3Assumptions and Properties of PCA-Transformed Data
When using this technique we are accepting some ground assumptions behind
PCA and correspondingly we have to carefully consider situations where the
assumptions are not (fully) valid.
Assumptions of PCA
– Linearity. This fact assumes that new vectors i.e. components are linear
combination of original ones, so we rely on the same concept for leakage.
– Components with large variances are the most interesting ones. We show in
Sect. 3 that this is not always valid.
– The reduction of the dimension of the original data set does not lead to the
loss of important information. On the contrary, this can lead to better results
e.g. when noise is removed to improve the key recovery.
As time-domain traces often have multiple samples where leakage is presented,
we hypothesize that these samples will be projected onto only a few PCs. This
implies that effective filtering strategies are possible when other PCs are filtered
out, and additionally CPA could be performed on PCA transformed traces. We
investigate this in more details in the remainder of this paper.
2.4 Multiple Leakage Points and PCA
As mentioned above PCA computes new variables i.e. principal components
which are derived as linear combinations of the original variables. As a conse-
quence, in the context of power analysis we have the following observation. An
interesting property of PCA in the context of trace set analysis is that corre-
lating samples in time are projected onto one or a few PCs. For instance, a PC
which has two positive peaks at time t0 and t1 implies that the original trace
First Results of Using Principal Component Analysis389
set has positively correlating values at time t0and t1. Also, the larger the PC
is, the larger the variance at these times is.
To show these properties, we analyze applying PCA transformations to some
simulated power traces. We first create a noisy trace set with three points of
leakage: a point A with leakage, a non-correlating point B with leakage, and a
point C with leakage negatively correlating with peak A. Each trace also has
some low noise added. Figure 2 shows twenty overlapped traces from this set.
We also create a similar set with misaligned peaks (Figure 3). For both trace
sets we calculate the principle components, as shown in Figure 4a and 4b.
First, it is clear that the first principle component captures the correlation
between the peaks A and C, and the second captures peak B. This also implies
that all samples in peak A and C (and similarly peak B) are accumulated into
one dimension after the PCA transformation. This can potentially increase DPA
leakage, which is calculated per dimension. It is interesting this holds for both
aligned and misaligned traces, which also shows that a transformation could
project misaligned peaks onto a few dimensions.
Second, in the aligned case, the first two principle components capture all
peak information. The other principle components all capture noise. For the
misaligned case, the other principle components represent different shifted peaks.
These experiments show that, even under misalignment, PCA transformations
can project multiple correlating points of leakage onto several PCs. As DPA on
a transformed traceset analyzes PCs separately, this may improve the signal-to-
Fig.2. Aligned traces with three leakage points
Fig.3. Misaligned traces with three leakage points
390L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
(a) Aligned peaks.(b) Misaligned peaks.
Fig.4. First four Principle Components of transformed trace sets
Due to various countermeasures that aim at making DPA more difficult by e.g.
adding a lot of noise, it is sometimes required to perform a lot of pre-processing
to remove the noise for successful key recovery. In particular, one can use chosen
Principal Components to retain only certain (sensitive) information. One of the
assumptions behind PCA (cf. Sect. 2) is that PC with larger variances represent
interesting data, while others with lower variances represent noise. Therefore,
the goal is to remove the components which contribute to the noise. The first
step is the same as in a transformation, the feature vector U is transposed and
multiplied with the transposed mean-adjusted data X.
Y = UT∗ XT= (X ∗ U)T
Hence, when extraction the undesired i.e. noise-representing components is per-
formed, the procedure is as follows. The dimensionality of the data is reduced
to p by projecting each x in X into y = UT
of U, and y is a p-vector. The PCA approximation (of the input x) with only p
principal components is then;
p∗x where Upare the first p columns
˜ x =
j∗ x) ∗ uj
and the (squared reconstruction) error can be shown to be equal to?n
Choosing the Right Components to Keep. There are extensive discus-
sions in the literature about the choice of components to keep in order to get
the maximum from using PCA. The ideas and approaches depend heavily on
applications. Since this method is mostly used to find the most distinctive data
that is, the sum of the eigenvalues for the unused eigenvectors.
First Results of Using Principal Component Analysis391
(which usually is the data with the most variance), most of the literature deals
with deciding about the amount of the “smaller” components that can be left
For side-channel analysis, this is not the right route to take. Usually, power
traces contain a lot of noise and this noise typically has a large variation relative
to the DPA information we are looking to find. This means that, depending on
the process of collecting the power measurements to be analyzed, the noise can
also be captured by the largest principal components, especially for “real” trace
sets i.e. the one where countermeasures are deployed. Since we would like to
get rid of the noise, we need to find which principal components we can safely
remove without losing the data relating to the secret key. We address this issue
in our experiments.
As described in Section 2, PCA can be used in two ways. We can perform a
transformation using PCA and do the analysis on Principal Components, or
we could use only a subset of Principal Components to reduce the noise in the
original trace set. In this section we address both aspects.
We performed our experiments by taking power measurements of a smartcard
which contains a software DES implementation. In order to test PCA against
countermeasures, we used an implementation that contains a configurable coun-
termeasure that introduces random delays. We used a Picoscope 5203 and a
modified smart card reader to obtain power measurements from the used smart-
card. In order to enhance the signal, we used an analog 48 MHz low-pass filter.
We performed all experiments also on a hardware DES implementation and on
implementations of other cryptographic algorithms i.e. AES and ECC. These
experiments showed the same results as described in Section 3, which means
that the method is not implementation or algorithm dependent.
3.1 Noise Reduction
We know that the process-related signal within the trace set has a large varia-
tion, so it should be captured by the largest Principal Components. Any noise
in the measurement is captured by smaller PCs. However, this general observa-
tion is not directly applicable to power consumption signals. More precisely, the
exact positions of key-related information differ for various implementations and
Nevertheless, it is valuable to find out where the noise-related information
is located, either as a result of some countermeasures or due to measurement
set-ups that are used. If we remove these principal components and retain all
others (the ones which contain the key information), we might be able to reduce
the noise i.e. to improve the signal to noise ratio, and therefore enhance the DPA
We tested this hypothesis on our set of power measurements of a software
DES implementation. We took 200 traces of 312 samples of the DES encryption.
392L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
All countermeasures were turned off, which meant that we already could find
the correct key using a CPA attack. Also, this meant that we knew in which
sample the key leakage was present.
We used this trace set for the rest of our experiments with noise reduction
using PCA. We plotted the principal components to see if they contained any
interesting properties. It appeared that different principal components captured
more or less information for the known key leakage samples. As an example, we
inspected the 15th principal component. It appeared to have a high peak for the
sample with the key leakage of S-Box 8 so it contained some information about
the data at that sample location. We tested this hypothesis by performing a
noise reduction retaining all principal components except this 15th. When we
performed a DPA attack on the resulting trace set, the correlation for the correct
key guess for S-box 8 dropped significantly.
Since we expect the largest amount of non-key information to be captured by
the largest principal components, we remove some of these largest components.
When we perform a DPA attack on the resulting noise-reduced trace set, we
can see an increase in the correlation value for the correct key guess. From this
observation, we can conclude that we removed more noise than signal.
We observed that different components capture the DPA information from
different S-boxes. This means that the best results are obtained if one knows
which component captures the most variation for the sample where the key
leakage is. Subsequently this means that for the best results, one needs to know
the sample with the key leakage. This implies that one should obtain a card of the
same type with a known key to find at which moment in time the information is
exploitable. In this way, a sort of profiling is performed i.e. templates are created
in order to speed-up the key recovery.
Another useful observation is on software versus hardware implementations.
Our findings prove hardware measurements obtained from a card with a co-
processor more “noisy” and best results were obtained by removing up to the
first 50 components. The exception was a set of measurements obtained from
SASEBO-R board where the highest key-dependent leakage was observed within
the 3rd principal component. As a conclusion, there is a lot of potential in PCA
for noise reduction, but the threshold for improving the leakage has to be decided
on the basis of a given implementation. Nevertheless, we were able to improve
the leakage in all observed cases.
Whereas during noise reduction we first transform the trace set, remove some
components and then transform the trace set back for further analysis, we could
also only perform the transformation. This will put the Principal Components
on the main axis, which means that all variance that is correlated at different
points in time will be projected onto a few PCs as elaborated above.
CPA Highest-Peak Distinguisher. We used the same trace set as before
containing 200 traces with 312 samples. To see which effect a transformation has
First Results of Using Principal Component Analysis393
on the results of a DPA attack, we performed a DPA attack on this transformed
trace set. We found that the correct key guess did not contain the highest peak
in the correlation graph. This means that we are not able to find the correct key
after a PCA transformation.
CPA Abs-Avg Distinguisher. However, when we inspected the correlation
graphs for all key guesses, we found that the graph for the correct key guess was
different from the graphs for the wrong key guesses, see Fig. 5. The correlation
for the first samples (which correspond to the higher principal components)
was higher for the correct key guess compared to the wrong key guess. The
main difference is however that the correlation for the lower samples was much
lower for the correct key guess compared to the wrong key guesses. Actually, the
conclusion is that variances are not the same for all PCs, which we expect when
the right key is used. This is in line with the results of a normal DPA attack
where the correlation for unrelated samples can also be lower for the correct key
guess compared to the wrong key guess .
Fig.5. Correlation trace for the wrong (upper) and the correct (lower) subkey guess
In order to quantify this, we add the absolute values of all samples for all
correlation traces x and divide this result by the total number of samples n in
order to create an average value avg.
Where xidenotes the value of the sample at index i in correlation trace x.
We use this method to calculate the absolute average value of each correlation
trace for all samples. A plot of these values is shown in Fig. 6. The x-axis shows
the key guesses for each S-box i.e. the first 64 values correspond to the 64 subkeys
of S-box 1 etc.
394L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Fig.6. Absolute average value of each correlation trace for software DES
The main thing we notice in this graph is the 8 outlying peaks at different
locations. A peak means that the absolute average value of one correlation trace
is lower than the value of the other correlation traces. So we can basically say
that the correlation for that key guess is lower than the correlation of the other
key guesses. Since DES has 8 subkeys we can easily distinguish these and derive
the used secret key. We have found similar results for a hardware DES and an
FPGA-implementation of AES-256.
3.3PCA on Misaligned Traces
An effective countermeasure against DPA attacks is the introduction of random
delays during execution of the algorithm. This decreases the effectiveness of DPA
as compared to a normal execution since the S-boxes are processed at different
moments in time.
Performing DPA on PCA-transformed traces however, is not as sensitive to
timing. Misalignment creates, in essence, a correlation between different points
in time. This may result in these samples being projected onto a few PCs, and
thereby reduce the effect of the random delay countermeasure.
In order to test this hypothesis, we perform a PCA transformation on an
obtained trace set of 500 traces with 2081 samples of a smartcard performing
software DES with a random delay countermeasure.
We first perform a PCA transformation of the traces. In component 41–57 we
find patterns that are interesting; see Fig. 7.
To investigate this further, we perform a DPA attack on the PCA transformed
traces and obtain the correlation traces for each key guess. From this we find
some peaks at the right key candidate for PC 46, which is very similar to 41. It
thereby appears that these components encode and gather the misaligned key
When we calculate the absolute average value for each of the obtained cor-
relation traces we get the graph as shown in Fig. 8 (Please note that due to
computational issues, we were only able to keep the DPA information for the
first five S-boxes.)
From this experiment we find that our hypothesis that misalignment causes a
correlation between the neighboring samples that have key leakage, and there-
fore they are projected onto a few components. After using the absolute-average
distinguisher, we are able to fully extract the key of the software DES imple-
mentation with random delays.
First Results of Using Principal Component Analysis395
Fig.7. The 40th and the 41 principal component of a PCA transformed traceset with
Fig.8. Absolute average value of each correlation trace for software DES
4Comparison to Other Alignment Techniques
There are several other algorithms proposed to handle the misalignment coun-
termeasure e.g. [9,16,11] and in this Section we compare PCA with one of them
i.e. with Static alignment.
Static alignment is the most natural method for the treatment of misaligned
traces and it is clearly described in  by Mangard et al. To apply the algorithm,
it is first required to choose a fragment in a so-called reference trace, which should
be ideally close to the attacking interval. Then the algorithm searches for the
same fragment in the other traces and shifts them accordingly. In this way the
alignment of the reference fragment is performed. The main disadvantages of
this method is in somewhat reduced efficiency, when compared to more recent
algorithms but it does improve on the number of traces required for a successful
We compare static alignment with PCA on the height of the peak for the
correct key guess. For both methods, we compare the difference in the height
of the peak i.e. the correlation values for the correct key guess and for the first
wrong key guess. For PCA, we actually look at the difference in the height of the
average value of the correlation trace. To derive the values for static alignment,
we use the misaligned trace set from our sample card and statically align them
before doing a DPA attack. For PCA, we use the same (misaligned) trace set, to
which we first perform a PCA transformation, and afterwards we calculate the
396L. Batina, J. Hogenboom, and J.G.J. van Woudenberg
Table 1. Comparison between Static alignment and PCA
Static alignment PCA
Correct key guess
First wrong key guess
absolute average value for 150 samples of the correlation traces. The results can
be found in Table 1.
We see that PCA does not outperform static alignment, at least for the chosen
trace set. However, the results should be considered less strictly as the method
used for PCA differs from the one for static alignment i.e. actual correlation val-
ues versus absolute average values. Nevertheless, it proves PCA a viable method
for alignment along with pre-processing. As future work, we plan to perform a
meaningful comparison with other, recently published alignment methods such
as Elastic alignment  and RAM .
In this work we introduce Principal Component Analysis as a suitable preprocess-
ing technique on the power traces to enhance the effectiveness of DPA attacks.
In particular, we advocate two separate cases to use PCA, for noise reduction
and a PCA transformation (before the actual DPA). Our results are verified in
practice by several experiments on both, protected and unprotected implemen-
tations. In the experiments we were able to improve the signal to noise ratio in
various occasions when the location of the key leakage is known. We were able
to de-noise a given trace set by retaining only the Principal Components which
capture the variance at the location of the key leakage. The effect of this noise
reduction was that the guessed, correct subkeys had a higher correlation when
a DPA attack was performed on the noise-reduced trace set as opposed to the
correlation on the original trace set. This method works for each of the trace
sets we used.
Acknowledgements. We would like to thank Riscure for providing an envi-
ronment for fruitful discussion during the research, and for providing the side-
channel analysis platform that was used for this work (Inspector). We also thank
Yang Li and Kazuo Sakiyama from University of Electro Communication, Tokyo
for providing us with suitable traces from a SASEBO board. We thank Elena
Marchiori from RU Nijmegen for her insightful comments.
This work was supported in part by the IAP Programme P6/26 BCRYPT of
the Belgian State and by the European Commission under contract number ICT-
2007-216676 ECRYPT NoE phase II and by the K.U.Leuven-BOF (OT/06/40).
First Results of Using Principal Component Analysis397
1. Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template At-
tacks in Principal Subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS,
vol. 4249, pp. 1–14. Springer, Heidelberg (2006)
2. Bohy, L., Neve, M., Samyde, D., Quisquater, J.-J.: Principal and independent com-
ponent analysis for crypto-systems with hardware unmasked units. In: Proceedings
of e-Smart 2003 (2003)
3. Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis with a Leakage Model.
In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29.
Springer, Heidelberg (2004)
4. Clavier, C., Coron, J.-S., Dabbous, N.: Differential Power Analysis in the Presence
of Hardware Countermeasures. In: Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS,
vol. 1965, pp. 252–263. Springer, Heidelberg (2000)
5. Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual Information Analysis - A
Generic Side-Channel Distinguisher. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008.
LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008)
6. Hotelling, H.: Analysis of a complex of statistical variables into principal compo-
nents. The Journal of Educational Psychology, 417–441 (1933)
7. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics.
Springer, New York (2002)
8. Kocher, P.C., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
9. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets
of Smart Cards (Advances in Information Security). Springer-Verlag New York,
Inc., Secaucus (2007)
10. Messerges, T.S.: Power analysis attacks and countermeasures for cryptographic
algorithms. PhD thesis, University of Illinois at Chicago, Chicago, IL, USA (2000)
11. Muijrers, R.A., van Woudenberg, J.G.J., Batina, L.: RAM: Rapid Alignment
Method. In: Prouff, E. (ed.) CARDIS 2011. LNCS, vol. 7079, pp. 266–282. Springer,
12. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philo-
sophical Magazine Series 2(6), 559–572 (1901)
13. Smith, L.I.: A tutorial on principal components analysis (February 2002), http://
14. Souissi, Y., Nassar, M., Guilley, S., Danger, J.-L., Flament, F.: First Principal
Components Analysis: A New Side Channel Distinguisher. In: Rhee, K.-H., Nyang,
D. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 407–419. Springer, Heidelberg (2011)
15. Standaert, F.-X., Malkin, T.G., Yung, M.: A Unified Framework for the Analysis
of Side-Channel Key Recovery Attacks. In: Joux, A. (ed.) EUROCRYPT 2009.
LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009)
16. van Woudenberg, J.G.J., Witteman, M.F., Bakker, B.: Improving Differential
Power Analysis by Elastic Alignment. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS,
vol. 6558, pp. 104–119. Springer, Heidelberg (2011)