ResearchPDF Available

Principal Component Analysis over encrypted data using Homomorphic Encryption

Authors:

Abstract

We describe an algorithm to perform Principal Component Analysis(PCA) over encrypted data using homomorphic encryption. PCA is a fundamental tool for exploratory data analysis and dimensionality reduction, and thus a useful application for privacy-preserving computation in the cloud.
Principal Component Analysis over encrypted data using
homomorphic encryption
Hilder V. L. Pereira1, Diego F. Aranha1
1Institute of Computing (UNICAMP)
Av. Albert Einstein, 1251, 13083-852, Campinas-SP, Brazil
hilder@lasca.ic.unicamp.br,dfaranha@ic.unicamp.br,
Abstract. We describe an algorithm to perform Principal Component Analysis
(PCA) over encrypted data using homomorphic encryption. PCA is a fundamen-
tal tool for exploratory data analysis and dimensionality reduction, and thus a
useful application for privacy-preserving computation in the cloud.
1. Introduction
The increasingly intrusive behavior of governments and corporations and sensitive infor-
mation leaks observed this year put into question the long-term viability of cloud com-
puting as the prominent industry paradigm. Although this was an inherent risk since the
introduction of cloud computing, the associated security and privacy issues of delegating
computing to a third party became self-evident only recently.
A possible solution to accommodate these conflicting requirements is computing
over encrypted data. In this model, data is encrypted by a transformation which conserves
part of their structure and allows further execution of certain operations. Because of
practical difficulties with fully homomorphic encryption that allows arbitrary computation
in type and number of operations, a growing research area is dedicated to study partially
homomorphic schemes and to adapt algorithms to work correctly in the encrypted domain.
In this work, we propose an algorithm for performing PCA over encrypted data stored
in the cloud. PCA is a fundamental step in data analysis and machine learning, thus a
promising application for privacy-preserving computing. The proposed algorithm is non-
interactive in nature and compatible with somewhat homomorphic encryption schemes.
2. Preliminaries
In this section, we recall basic Linear Algebra, without proof due to space constraints.
Definition 2.1 (Eigenvector and eigenvalue).Let Xbe a real matrix in Rn×n. We say that
a scalar λRis a eigenvalue of Xif there exists a non-zero vector vRnsuch that
Xv =λv. We also say that vis the eigenvector associated with λand that (λ, v)is an
eigenpair of X. Eigenvectors are invariant to multiplication by a scalar and the dominant
eigenvalue of Xis the one with the largest absolute value.
Definition 2.2 (Shifting eigenpairs).We say that a procedure shifts the eigenvalues of a
matrix Xif it returns any matrix Bsuch that the dominant eigenvalue of Bis equal to the
second dominant eigenvalue of Xand their associated eigenvectors are the same. More
formally, given Xand dominant eigenpairs (λi, vi), a function fshifts the eigenvalues of
Xif f(X) = BRn×nwith dominant eigenpairs (λi+1, vi+1).
Theorem 2.3 (Spectral Theorem [Watkins 2005]).Suppose ARn×nis symmetric.
Then, it can be written as A=U DUT, where Uis a orthogonal matrix where each
column is a normalized eigenvector and Dis a diagonal matrix with the eigenvalues on
the principal diagonal in an order correspondent to the columns of U. In other words, for
i∈ {1,2, .., n}, the pair (Dii, Ui)is an eigenpair, where Uiis the i-th column of U.
Corollary 2.4 (Symmetric matrix as a sum).Let ARn×nbe symmetric and (λ1, v1),
(λ2, v2), ..., (λn, vn), eigenpairs of A, with ||vi|| = 1 for i∈ {1,2, .., n}. Then, Amay be
written as A=
n
X
i=1
λivivT
i.
3. Principal Component Analysis
The problem of finding the principal components of a data matrix Xis equivalent to the
problem of finding the eigenvectors of its covariance matrix. In general, the i-th principal
component is the i-th dominant eigenvector [Jolliffe 2002]. Hence, to project the data into
aK-dimensional space, we have to find the Kdominant eigenvectors.
3.1. Power Method
The Power Method is a simple iterative algorithm to find a dominant eigenvector of a given
matrix. Let ARn×nbe a real matrix. We sample a random initial vector uRnand
multiply Aby urepeatedly, generating the sequence Au,A2u,A3u, . . . , that converges
to a dominant eigenvector. If we write the initial vector uas a linear combination of the
eigenvectors v1, v2, .., vn, we have:
Aku=Ak(α1v1+α2v2+α3v3+... +αnvn) = λk
1α1v1+λk
2α2v2+... +λk
nαnvn.
Assuming that v1is a dominant eigenvector, we have |λk
1|>|λk
i|, for i∈ {2,3, .., n}.
Therefore, if we divide both sides by λk
1, it converges to a multiple of v1:
Aku
λk
1
=α1v1+λk
2
λk
1
α2v2+... +λk
n
λk
1
αnvn.
In order to avoid underflow and overflow in practice, it is common to divide the sequence
by a scaling factor θk. The resulting algorithm for the Power Method can be found below:
1pow erMeth od (A)
2N=A. l i n e s
3u=r a nd o m V ec t o r ( N)
4f o r k= 1 t o STEPS
5u=Au
θk
6return u
3.2. Finding Kprincipal components
Our strategy to find the principal components is to calculate the covariance matrix of the
data and find the Kdominant eigenvectors by repeatedly using the Power Method and
a shifting procedure. Since the covariance matrix is symmetric, the following function
works as a shifting procedure:
1eigenShift (A, do m i na n t e i g e n v e c t o r v)
2u=v
||v||
3return B=AAuuT
Theorem 3.1. Let Abe a n×nreal symmetric matrix. Then the function eigenShift shifts
the eigenvalues of A.
Proof. Since the first operation of eigenShift is normalizing v, we have that vbecomes
equal to v1, the normalized dominant eigenvector. Since vT
1v1=||v1||2= 1, we have
Bv1=Av1(Av1vT
1)v1=Av1Av1(vT
1v1) = Av1Av1=λ1v1λ1v1= 0 ·v1
which proves that v1is also an eigenvector of Bbut now associated with a new eigenvalue
λnew = 0. By Corollary 2.4, the matrix Amay be written as A=Pn
i=1 λivivT
i, and thus
B=λ2v2vT
2+λ3v3vT
3+... +λnvnvT
n.
For all i∈ {2,3, .., n}, we have Bvi=λ2v2vT
2vi+... +λivivT
ivi+... +λnvnvT
nvi.By
Theorem 2.3, all the eigenvectors are orthogonal, so for j6=i, the product vT
jviis equal to
0, and the product vT
iviis equal to 1. Then, Bvi= 0+0+...+0+λivi·1+0+...+0 = λivi,
which proves that all the other eigenpairs of Aare also eigenpairs of B. Therefore, all the
eigenvectors of Aare also eigenvectors of B, the dominant eigenvector of Ais associated
with the eigenvalue λnew = 0, and the second dominant eigenvalue of Ais the dominant
eigenvalue of B.
In order to calculate the covariance matrix, we just have to set the mean of each
variable (column of the data matrix) to zero and then make a matrix multiplication.
1covarianceMatrix (X)
2N=X. l i n e s
3P=X. col u m n s
4f o r j= 1 t o P
5µ= 0
6f o r i= 1 t o N
7µ=µ+X[i][j]
8µ=µ
N
9/S u b t r a c t t h e m ean . /
10 f o r i= 1 t o N
11 X[i][j] = X[i][j]µ
12 C=XTX
13 return C
Our proposal for solving the PCA problem is the following:
1PCA( X, new di m e n s i o n K)
2C=covariance (A)
3pcs =
4f o r i= 1 t o K
5pci=p ow er M et ho d ( C)
6C=eigenShift (C,pci)
7pcs ={pci} ∪ pcs
8return pcs
4. Homomorphic version
Employing a Somewhat Homomorphic Encryption (SWE) scheme such as [Bos et al. 2013]
for privacy-preserving computation involves some restrictions on the operations that can
be performed on the data. Usually, we can only make additions and a few multiplications
over the ciphertexts, and general divisions are not viable. If encoding real numbers is
possible using [Aono et al. 2015], we can also divide the ciphertexts by constants or any
other known values (number nof elements submitted by the client, for example). Be-
cause of these restrictions, we have to modify the algorithms to remove divisions between
ciphertexts and to minimize the number of consecutive multiplications.
For the Power Method, the value θkcan be chosen as a constant, and the computa-
tion of the covariance matrix only employs a value known a priori, thus divisions can be
performed between ciphertexts and plaintexts. The remaining obstacle is the eigenShift
procedure. Since the components of the vectors are encrypted, we cannot normalize v
dividing it by its norm (first operation of the eigenShift) as B=AAv
||v||
vT
||v|| .
However, Definitions 2.1 tell us that Band ||v||2Bhave the same eigenvectors.
Hence:
||v||2B=||v||2(AAv
||v||
vT
||v||) = ||v||2AAvvT,
which means that we can compute ||v||2Bfrom Aand vwithout divisions between ci-
phertexts. Finally, using the relation between the inner product and the Euclidean norm,
namely, vTv=||v||2, the homomorphic version of the eigenShift function can be de-
scribed as follows:
1homomorphicShift (A,v)
2α=i n n e r P r o d u c t ( v,v)
3B=αA AvvT
4return B
This way, the entire Power Method can be computed over encrypted data.
5. Conclusion
Principal Component Analysis can be computed in a privacy-preserving way, by adapting
all of the required steps in the Power Method to remove expensive divisions and employ-
ing a Somewhat Homomorphic Encryption scheme with a bounded number of multipli-
cations. As far as we know, this is the first non-interactive proposal for performing PCA
over encrypted data in the cloud.
References
Aono, Y., Hayashi, T., Phong, L. T., and Wang, L. (2015). Fast and secure linear regres-
sion and biometric authentication with security update. Cryptology ePrint Archive,
Report 2015/692. http://eprint.iacr.org/.
Bos, J. W., Lauter, K., Loftus, J., and Naehrig, M. (2013). Improved security for a ring-
based fully homomorphic encryption scheme. In Cryptography and Coding (IMACC),
pages 45–64. Springer.
Jolliffe, I. T. (2002). Principal Component Analysis. Springer Series in Statistics.
Watkins, D. S. (2005). Fundamentals of Matrix Computations. Wiley, 2nd edition.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
In 1996, Hoffstein, Pipher and Silverman introduced an efficient lattice based encryption scheme dubbed NTRUEncrypt. Unfortunately, this scheme lacks a proof of security. However, in 2011, Stehlé and Steinfeld showed how to modify NTRUEncrypt to reduce security to standard problems in ideal lattices. In 2012, López-Alt, Tromer and Vaikuntanathan proposed a fully homomorphic scheme based on this modified system. However, to allow homomorphic operations and prove security, a non-standard assumption is required. In this paper, we show how to remove this non-standard assumption via techniques introduced by Brakerski and construct a new fully homomorphic encryption scheme from the Stehlé and Steinfeld version based on standard lattice assumptions and a circular security assumption. The scheme is scale-invariant and therefore avoids modulus switching and the size of ciphertexts is one ring element. Moreover, we present a practical variant of our scheme, which is secure under stronger assumptions, along with parameter recommendations and promising implementation results. Finally, we present an approach for encrypting larger input sizes by extending ciphertexts to several ring elements via the CRT on the message space.
Chapter
Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this tendency has been continued by many computer packages which treat PCA as one option in a program for factor analysis—see Appendix A2. This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques. The confusion may have arisen, in part, because of Hotelling’s (1933) original paper, in which principal components were introduced in the context of providing a small number of ‘more fundamental’ variables which determine the values of the p original variables. This is very much in the spirit of the factor model introduced in Section 7.1, although Girschick (1936) indicates that there were soon criticisms of Hotelling’s method of PCs, as being inappropriate for factor analysis. Further confusion results from the fact that practitioners of ‘factor analysis’ do not always have the same definition of the technique (see Jackson, 1981). The definition adopted in this chapter is, however, fairly standard.
Article
Preface. Acknowledgments. 1 Gaussian Elimination and Its Variants. 1.1 Matrix Multiplication. 1.2 Systems of Linear Equations. 1.3 Triangular Systems. 1.4 Positive Definite Systems Cholesky Decomposition. 1.5 Banded Positive Definite Systems. 1.6 Sparse Positive Definite Systems. 1.7 Gaussian Elimination and the LU Decomposition. 1.8 Gaussain Elimination and Pivoting. 1.9 Sparse Gaussian Elimination. 2 Sensitivity of Linear Systems. 2.1 Vector and Matrix Norms. 2.2 Condition Numbers. 2.3 Perturbing the Coefficient Matrix. 2.4 A Posteriori Error Analysis Using the Residual. 2.5 Roundoff Errors Backward Stability. 2.6 Propagation of Roundoff Errors. 2.7 Backward Error Analysis of Gaussian Elimination. 2.8 Scaling. 2.9 Componentwise Sensitivity Analysis. 3 The Least Squares Problem. 3.1 The Discrete Square Problem. 3.2 Orthogonal Matrices, Rotators and Reflectors. 3.3 Solution of the Least Squares Problem. 3.4 The Gram-Schmidt Process. 3.5 Geometric Approach. 3.6 Updating the QR Decomposition. 4 The Singular Value Decomposition. 4.1 Introduction. 4.2 Some Basic Applications of Singular Values. 4.3 The SVD and the Least Squares Problem. 4.4 Sensitivity of the Least Squares Problem. 5 Eigenvalues and Eigenvectors I. 5.1 Systems of Differential Equations. 5.2 Basic Facts. 5.3 The Power Method and Some Simple Extensions. 5.4 Similarity Transforms. 5.5 Reduction to Hessenberg and Tridiagonal Forms. 5.6 Francis's Algorithm. 5.7 Use of Francis's Algorithm to Calculate Eigenvectors. 5.8 The SVD Revisted. 6 Eigenvalues and Eigenvectors II. 6.1 Eigenspaces and Invariant Subspaces. 6.2 Subspace Iteration and Simultaneous Iteration. 6.3 Krylov Subspaces and Francis's Algorithm. 6.4 Large Sparse Eigenvalue Problems. 6.5 Implicit Restarts. 6.6 The Jacobi-Davidson and Related Algorithms. 7 Eigenvalues and Eigenvectors III. 7.1 Sensitivity of Eigenvalues and Eigenvectors. 7.2 Methods for the Symmetric Eigenvalue Problem. 7.3 Product Eigenvalue Problems. 7.4 The Generalized Eigenvalue Problem. 8 Iterative Methods for Linear Systems. 8.1 A Model Problem. 8.2 The Classical Iterative Methods. 8.3 Convergence of Iterative Methods. 8.4 Descent Methods Steepest Descent. 8.5 On Stopping Criteria. 8.6 Preconditioners. 8.7 The Conjugate-Gradient Method. 8.8 Derivation of the CG Algorithm. 8.9 Convergence of the CG Algorithm. 8.10 Indefinite and Nonsymmetric Problems. References. Index. Index of MATLAB Terms.
Article
The separation of morphometric variation into a component related to size and other components associated with shape is of considerable interest and has generated much discussion in the literature. One class of approaches to achieving this separation is based on principal component analysis. A new technique is proposed within this class, which overcomes some of the disadvantages of existing approaches.
Fast and secure linear regression and biometric authentication with security update
  • Y Aono
  • T Hayashi
  • L T Phong
Aono, Y., Hayashi, T., Phong, L. T., and Wang, L. (2015). Fast and secure linear regression and biometric authentication with security update. Cryptology ePrint Archive, Report 2015/692. http://eprint.iacr.org/.