Preprint

Sparse principal component analysis for high-dimensional stationary time series

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

We consider the sparse principal component analysis for high-dimensional stationary processes. The standard principal component analysis performs poorly when the dimension of the process is large. We establish the oracle inequalities for penalized principal component estimators for the processes including heavy-tailed time series. The consistency of the estimators is established even when the dimension grows at the exponential rate of the sample size. We also elucidate the theoretical rate for choosing the tuning parameter in penalized estimators. The performance of the sparse principal component analysis is demonstrated by numerical simulations. The utility of the sparse principal component analysis for time series data is exemplified by the application to average temperature data.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this note we develop an extension of the Marčenko–Pastur theorem to time series model with temporal correlations. The limiting spectral distribution (LSD) of the sample covariance matrix is characterised by an explicit equation for its Stieltjes transform depending on the spectral density of the time series. A numerical algorithm is then given to compute the density functions of these LSD’s.
Article
Full-text available
This paper considers estimation of sparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are significantly different from those that occur in the conventional nonparametric function estimation problems. Standard techniques fail to yield good results, and new tools are thus needed. We first develop a lower bound technique that is particularly well suited for treating "two-directional" problems such as estimating sparse covariance matrices. The result can be viewed as a generalization of Le Cam's method in one direction and Assouad's Lemma in another. This lower bound technique is of independent interest and can be used for other matrix estimation problems. We then establish a rate sharp minimax lower bound for estimating sparse covariance matrices under the spectral norm by applying the general lower bound technique. A thresholding estimator is shown to attain the optimal rate of convergence under the spectral norm. The results are then extended to the general matrix w\ell_w operator norms for 1w1\le w\le \infty. In addition, we give a unified result on the minimax rate of convergence for sparse covariance matrix estimation under a class of Bregman divergence losses.
Article
Full-text available
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combination of all the original variables, thus it is often difficult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We first show that PCA can be formulated as a regression-type optimization problem; sparse loadings are then obtained by imposing the lasso (elastic net) constraint on the regression coefficients. Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data with encouraging results.
Article
Full-text available
We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
Article
Full-text available
In this paper we present a tail inequality for the maximum of partial sums of a weakly dependent sequence of random variables that are not necessarily bounded. The class considered includes geometrically and subgeometrically strongly mixing sequences. The result is then used to derive asymptotic moderate deviations results. Applications include classes of Markov chains, functions of linear processes with absolutely regular innovations and ARCH models
Book
Cambridge Core - Probability Theory and Stochastic Processes - High-Dimensional Probability - by Roman Vershynin
Article
. Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.
Article
Most existing theoretical results for the lasso require the samples to be iid. Recent work has provided guarantees for the lasso assuming that the time series is generated by a sparse Vector Auto-Regressive (VAR) model with Gaussian innovations. Proofs of these results rely critically on the fact that the true data generating mechanism (DGM) is a finite-order Gaussian VAR. This assumption is quite brittle: linear transformations, including selecting a subset of variables, can lead to the violation of this assumption. In order to break free from such assumptions, we derive non-asymptotic inequalities for estimation error and prediction error of the lasso estimate of the best linear predictor without assuming any special parametric form of the DGM. Instead, we rely only on (strict) stationarity and geometrically decaying β\beta-mixing coefficients to establish error bounds for the lasso for subweibull random vectors. The class of subweibull random variables that we introduce includes subgaussian and subexponential random variables but also includes random variables with tails heavier than an exponential. We also show that, for Gaussian processes, the β\beta-mixing condition can be relaxed to summability of the α\alpha-mixing coefficients. Our work provides an alternative proof of the consistency of the lasso for sparse Gaussian VAR models. But the applicability of our results extends to non-Gaussian and non-linear times series models as the examples we provide demonstrate.
Article
Many scientific and economic problems involve the analysis of highdimensional time series datasets. However, theoretical studies in highdimensional statistics to date rely primarily on the assumption of independent and identically distributed (i.i.d.) samples. In this work, we focus on stable Gaussian processes and investigate the theoretical properties of l1-regularized estimates in two important statistical problems in the context of high-dimensional time series: (a) stochastic regression with serially correlated errors and (b) transition matrix estimation in vector autoregressive (VAR) models. We derive nonasymptotic upper bounds on the estimation errors of the regularized estimates and establish that consistent estimation under high-dimensional scaling is possible via l1-regularization for a large class of stable processes under sparsity constraints. A key technical contribution of the work is to introduce a measure of stability for stationary processes using their spectral properties that provides insight into the effect of dependence on the accuracy of the regularized estimates. With this proposed stability measure, we establish some useful deviation bounds for dependent data, which can be used to study several important regularized estimates in a time series setting.
Article
Penalized regression methods for inducing sparsity in the precision matrix have grown rapidly in the last few years. They are central to the construction of high-dimensional sparse Gaussian graphical models. This chapter presents two specific examples of sparse precision matrices. Then, it explains the connection among conditional independence, the notion of partial correlations between variables Yi and Yj, and the regression coefficients when regressing Yi on all the other variables in Y. Further, the chapter describes a penalized likelihood method and the graphical Lasso (Glasso) algorithm for sparse estimation of the precision matrix. It concludes with various modifications of the Glasso estimators and their statistical properties.
Article
In this expository note, we give a modern proof of Hanson-Wright inequality for quadratic forms in sub-gaussian random variables. We deduce a useful concentration inequality for sub-gaussian random vectors. Two examples are given to illustrate these results: a concentration of distances between random vectors and subspaces, and a bound on the norms of products of random and deterministic matrices.
Article
Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's Lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.
Article
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.
Article
In this paper, the authors propose procedures for detection of the number of signals in presence of Gaussian white noise under an additive model. This problem is related to the problem of finding the multiplicity of the smallest eigenvalue of the covariance matrix of the observation vector. The methods used in this paper fall within the framework of the model selection procedures using information theoretic criteria. The strong consistency of the estimates of the number of signals, under different situations, is established. Extensions of the results are also discussed when the noise is not necessarily Gaussian. Also, certain information-theoretic criteria are investigated for determination of the multiplicities of various eigenvalues.
Article
We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l2l_2 loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.
Article
The existence of a limiting spectral distribution (LSD) for a large-dimensional sample covariance matrix generated by the vector autoregressive moving average (VARMA) model is established. In particular, we obtain explicit forms of the LSDs for random matrices generated by a first-order vector autoregressive (VAR(1)) model and a first-order vector moving average (VMA(1)) model, as well as random coefficients for VAR(1) and VMA(1). The parameters for these explicit forms are also estimated. Finally, simulations demonstrate that the results are effective.
Article
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.
Article
Under the common principal component model k covariance matrices Σ1,,Σk\mathbf{\Sigma}_1,\cdots,\mathbf{\Sigma}_k are simultaneously diagonalizable, i.e., there exists an orthogonal matrix β\mathbf{\beta} such that βΣiβ=Λi\mathbf{\beta'\Sigma_i\beta = \Lambda_i} is diagonal for i=1,,ki = 1,\cdots, k. In this article we give the asymptotic distribution of the maximum likelihood estimates of β\mathbf{\beta} and Λi\mathbf{\Lambda}_i. Using these results, we derive tests for (a) equality of eigenvectors with a given set of orthonormal vectors, and (b) redundancy of pqp - q (out of p) principal components. The likelihood-ratio test for simultaneous sphericity of pqp - q principal components in k populations is derived, and some of the results are illustrated by a biometrical example.
Article
Let x(1) denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x(1) is the largest principal component variance of the covariance matrix XXX'X, or the largest eigenvalue of a p­variate Wishart distribution on n degrees of freedom with identity covariance. ¶ Consider the limit of large p and n with n/p=γ1n/p = \gamma \ge 1. When centered by μp=(n1+p)2\mu_p = (\sqrt{n-1} + \sqrt{p})^2 and scaled by σp=(n1+p)(1/n1+1/p1/3\sigma_p = (\sqrt{n-1} + \sqrt{p})(1/\sqrt{n-1} + 1/\sqrt{p}^{1/3}, the distribution of x(1) approaches the Tracey-Widom law of order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. ¶ The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.
Article
Let S = (1/n) [Sigma]t=1n X(t) X(t)', where X(1), ..., X(n) are p - 1 random vectors with mean zero. When X(t) (t = 1, ..., n) are independently and identically distributed (i.i.d.) as multivariate normal with mean vector 0 and covariance matrix [Sigma], many authors have investigated the asymptotic expansions for the distributions of various functions of the eigenvalues of S. In this paper, we will extend the above results to the case when {X(t)} is a Gaussian stationary process. Also we shall derive the asymptotic expansions for certain functions of the sample canonical correlations in multivariate time series. Applications of some of the results in signal processing are also discussed.
Article
Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the ``large p, small n'' setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θdia(n,p,k)=n/[k2log(pk)]\theta_{\mathrm{dia}}(n,p,k)=n/[k^2\log(p-k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size θsdp(n,p,k)=n/[klog(pk)]\theta_{\mathrm{sdp}}(n,p,k)=n/[k\log(p-k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θsdp(n,p,k)\theta_{\mathrm{sdp}}(n,p,k) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference. Comment: Published in at http://dx.doi.org/10.1214/08-AOS664 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Augmented Sparse Principal Component Analysis for High Dimensional Data
  • D Paul
  • I M Johnstone
Paul, D. and Johnstone, I. M. (2012). Augmented Sparse Principal Component Analysis for High Dimensional Data. Technical Report. Available at arXiv:1202.1242v1.
Estimation and Testing under Sparsity
  • S A Van De Geer
van de Geer, S. A. (2016). Estimation and Testing under Sparsity. Springer.