Article

Combining eigenvalues and variation of eigenvectors for order determination

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In applying statistical methods such as principal component analysis, canonical correlation analysis, and sufficient dimension reduction, we need to determine how many eigenvectors of a random matrix are important for estimation. This problem is known as order determination, and amounts to estimating the rank of a matrix. Previous order-determination procedures rely either on the decreasing pattern, or elbow, of the eigenvalues, or on the increasing pattern of the variability in the directions of the eigenvectors. In this paper we propose a new order-determination procedure by exploiting both patterns: when the eigenvalues of a random matrix are close together, their eigenvectors tend to vary greatly; when the eigenvalues are far apart, their variability tends to be small. The combination of both helps to pinpoint the rank of a matrix more precisely than the previous methods. We establish the consistency of the new order-determination procedure, and compare it with other such procedures by simulation and in an applied setting.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014 and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. ...
... In other general approaches, Ye and Weiss (2003) considered eigenvectors rather than eigenvalues and proposed an estimation procedure that was based on the bootstrap variation of the subspace estimates for different dimensions. In a general approach, Luo and Li (2016) combined the eigenvalues and bootstrap variation of eigenvectors for consistent estimation of the dimension. The last two approaches are based on the notion that the variation of eigenvectors is large for the the eigenvalues that are close together and their variability tends to be small for far apart eigenvalues. ...
... Other examples on supervised dimension reduction methods are the canonical correlation analysis (CCA), sliced average variance estimate (SAVE) and principal Hessian directions (PHD), for example, and they all can be formulated using two scatter matrices. For these methods and estimation of the dimension of the signal subspace, also with regular bootstrap sampling, see Li (1991), Cook and Weisberg (1991), Li (1992), Bura and Cook (2001), Cook (2004), Zhu et al. (2006Zhu et al. ( , 2010, Bura and Yang (2011) and Luo and Li (2016) and the references therein. For a nice review on supervised dimension reduction, see Ma and Zhu (2013). ...
Preprint
Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014) and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space. The limiting null distributions of the test statistics are discussed and novel bootstrap strategies are suggested for the small sample cases. In all three cases, consistent test-based estimates of the signal subspace dimension are introduced as well. The asymptotic and bootstrap tests are compared in simulations and illustrated in real data examples.
... The first continues our use of the logic of Weyl's inequality to propose a consistent estimate as T ∞. The second uses the laddle plot method of Luo and Li (2016). We did not verify the required assumptions in Luo and Li (2016), so we do not claim their estimator is consistent in our problem, but we find in practice (as do they) that the estimator performs well, so we suggest practitioners actually use this. ...
... The second uses the laddle plot method of Luo and Li (2016). We did not verify the required assumptions in Luo and Li (2016), so we do not claim their estimator is consistent in our problem, but we find in practice (as do they) that the estimator performs well, so we suggest practitioners actually use this. ...
... 3.6.2 Luo and Li (2016): Laddle Plot Recent work, by Luo and Li (2016), however, has been shown to have more appealing finite sample performance and in Appendix E, we provide the algorithm to estimate the rank with this method. The intuition for their approach is as follows. ...
Article
A common approach to modelling networks assigns each node to a position on a low-dimensional manifold where distance is inversely proportional to connection likelihood. More positive manifold curvature encourages more and tighter communities; negative curvature induces repulsion. We consistently estimate manifold type, dimension, and curvature from simply connected, complete Riemannian manifolds of constant curvature. We represent the graph as a noisy distance matrix based on the ties between cliques, then develop hypothesis tests to determine whether the observed distances could plausibly be embedded isometrically in each of the candidate geometries. We apply our approach to datasets from economics and neuroscience.
... Category (1) is by far the more popular one (as general methods are more difficult to come by) and usually accomplishes the estimation by exploiting asymptotic properties of eigenvalues of certain matrices, see, e.g., [15,18] for PCA and [2,26] for dimension reduction in the context of regression (sufficient dimension reduction). Whereas, methods belonging to category (2) tend to be based on various bootstrapping and related procedures, see [12,23] and, in particular, [13] whose augmentation procedure serves as a starting point for the current work. ...
... This improvement both allows us to get quantitative results concerning the eigenvectors and sheds some light on the interplay of the augmentation procedure with the true dimensionality of the data. (iv) To accompany the augmentation estimator, we also propose an alternative estimator of the latent dimensionality based on the "ladle" procedure [12]. ...
... As a competitor to the augmentation strategy presented in Section 3.3, we introduce a generalization of the bootstrapbased "ladle"-technique for extracting information from the eigenvectors of the variation matrix presented in [12] for vector-valued observations. The general idea is to use bootstrap resampling techniques to approximate the variation of the span of the first k eigenvectors of the corresponding sample scatter matrix, where high variation of the span indicates that the chosen eigenvectors belong to the same eigenspace, i.e., that the difference between the corresponding eigenvalues is small, see [23]. ...
Preprint
Full-text available
Tensor-valued data benefits greatly from dimension reduction as the reduction in size is exponential in the number of modes. To achieve maximal reduction without loss in information, our objective in this work is to give an automated procedure for the optimal selection of the reduced dimensionality. Our approach combines a recently proposed data augmentation procedure with the higher-order singular value decomposition (HOSVD) in a tensorially natural way. We give theoretical guidelines on how to choose the tuning parameters and further inspect their influence in a simulation study. As our primary result, we show that the procedure consistently estimates the true latent dimensions under a noisy tensor model, both at the population and sample levels. Additionally, we propose a bootstrap-based alternative to the augmentation estimator. Simulations are used to demonstrate the estimation accuracy of the two methods under various settings.
... To formulate these intuitions, we introduce the concept of O + P (1) for a "large" stochastic sequence; that is, a sequence of random variables {Z n : n = 1, . . . , } is O + P (1) if Z n > δ with probability converging to one for some δ > 0. We refer to [15] for more details about this concept. ...
... , if we set π A,0 appropriately such that π A,0 + n −1/2 π −1 A,0 → 0 and set c > 1 in (15), then η(·) tends to be minimized uniquely at K; that is, Let K be the minimizer of η(·). By Theorem 3, K is a consistent estimator of K. ...
... Step 3 Calculate η(·) as in (15) and minimize η(·) to derive K. ...
... The ladle estimator, which integrates both eigenvalue and eigenvector information to estimate matrix rank, was first introduced in Luo and Li (2016) for multivariate data. We extend this approach to functional data based on the key observation that the variability of the estimated eigenfunctions increases sharply when their index exceeds the true rank d, while the corresponding estimated eigenvalues exhibit a steep drop. ...
... We extend this approach to functional data based on the key observation that the variability of the estimated eigenfunctions increases sharply when their index exceeds the true rank d, while the corresponding estimated eigenvalues exhibit a steep drop. Unlike the multivariate setting studied by Luo and Li (2016), we need to account for the intrinsic infinite-dimensional nature of functional data and sparse observations that are contaminated by measurement er-rors. In particular, we estimate the mean and covariance functions by applying a local linear smoother to aggregated observations. ...
Preprint
Dimension reduction is often necessary in functional data analysis, with functional principal component analysis being one of the most widely used techniques. A key challenge in applying these methods is determining the number of eigen-pairs to retain, a problem known as order determination. When a covariance function admits a finite representation, the challenge becomes estimating the rank of the associated covariance operator. While this problem is straightforward when the full trajectories of functional data are available, in practice, functional data are typically collected discretely and are subject to measurement error contamination. This contamination introduces a ridge to the empirical covariance function, which obscures the true rank of the covariance operator. We propose a novel procedure to identify the true rank of the covariance operator by leveraging the information of eigenvalues and eigenfunctions. By incorporating the nonparametric nature of functional data through smoothing techniques, the method is applicable to functional data collected at random, subject-specific points. Extensive simulation studies demonstrate the excellent performance of our approach across a wide range of settings, outperforming commonly used information-criterion-based methods and maintaining effectiveness even in high-noise scenarios. We further illustrate our method with two real-world data examples.
... We now discuss how to determine the number of factors K and the dimension L of the central subspace S y|f . The problem is commonly called order determination in the literature of dimension reduction (Luo and Li, 2016). ...
... To estimate the dimension L of the central subspace S y|f , multiple methods have been proposed, including the sequential tests (Li, 1991;Li and Wang, 2007), the bootstrap procedure (Ye and Weiss, 2003), the cross-validation method (Xia et al., 2002;Wang and Xia, 2008), the BIC-type procedure (Zhu et al., 2006), and the ladle estimator (Luo and Li, 2016), among which we adopt the BIC-type procedure and extend it to the high-dimensional case. ...
Preprint
We consider forecasting a single time series using a large number of predictors in the presence of a possible nonlinear forecast function. Assuming that the predictors affect the response through the latent factors, we propose to first conduct factor analysis and then apply sufficient dimension reduction on the estimated factors, to derive the reduced data for subsequent forecasting. Using directional regression and the inverse third-moment method in the stage of sufficient dimension reduction, the proposed methods can capture the non-monotone effect of factors on the response. We also allow a diverging number of factors and only impose general regularity conditions on the distribution of factors, avoiding the undesired time reversibility of the factors by the latter. These make the proposed methods fundamentally more applicable than the sufficient forecasting method in Fan et al. (2017). The proposed methods are demonstrated in both simulation studies and an empirical study of forecasting monthly macroeconomic data from 1959 to 2016. Also, our theory contributes to the literature of sufficient dimension reduction, as it includes an invariance result, a path to perform sufficient dimension reduction under the high-dimensional setting without assuming sparsity, and the corresponding order-determination procedure.
... 作为代表, 切片逆回归方法在超高维稀疏数据情形 [5,51,56,57,81,97] 和高维非稀 疏数据情形 [12,26,56,66] [4,47,55]), 或者利用 M 的所有特征值由大到小排序所生成的碎石图, 构造 Bayes 信息准则 (参 见文献 [70,105]). 最新的定阶方法 [62,63] [85] . 通常, 研究者在应用中序贯地采用主梯度方法、最小平均方差估计方法、精炼最小平均 [71] 给出了一类正则化估计方法: ...
... T | X 的低维结构, 以及所得低维自变量达到 (5.4) 所允许的最低维数等优点. 文献 [61] 同时讨论了局 部有效降维子空间的唯一性的充分必要条件. 基于双重逆回归方法, 对局部有效降维子空间的定阶问 题可应用文献 [62,63] 等方法. 由于 (5.4) [20,22,31] 以及生存分 析场景下的充分降维 [19] 存在天然联系. ...
... The backbone network is built using the first K principal components, constructed as described above. The K we use is determined by a so-called ladle plot, with the goal of selecting a cutoff yielding an "optimal" low-dimensional representation (see Luo and Li (2016) for details). 11 10 We also redo the analysis just dropping jati and geography and keeping temple in Supplemental Appendix Figure S2. ...
... 11 To select the optimal number of principal components the literature usually relied on a cutoff based on patterns of either decreasing eigenvalues or increasing variability of eigenvectors. Luo and Li (2016) combine these two approaches to better estimate the optimal K. They propose a new estimator, called For a pair ij in village v, we compute the weighted sum of its projections on the first K principal components as ...
Preprint
Full-text available
Social and economic networks are often multiplexed, meaning that people are connected by different types of relationships -- such as borrowing goods and giving advice. We make three contributions to the study of multiplexing. First, we document empirical multiplexing patterns in Indian village data: relationships such as socializing, advising, helping, and lending are correlated but distinct, while commonly used proxies for networks based on ethnicity and geography are nearly uncorrelated with actual relationships. Second, we examine how these layers and their overlap affect information diffusion in a field experiment. The advice network is the best predictor of diffusion, but combining layers improves predictions further. Villages with greater overlap between layers (more multiplexing) experience less overall diffusion. This leads to our third contribution: developing a model and theoretical results about diffusion in multiplex networks. Multiplexing slows the spread of simple contagions, such as diseases or basic information, but can either impede or enhance the spread of complex contagions, such as new technologies, depending on their virality. Finally, we identify differences in multiplexing by gender and connectedness. These have implications for inequality in diffusion-mediated outcomes such as access to information and adherence to norms.
... (ii) Variation of the previous estimator where the null distributions are bootstrapped instead of relying on asymptotic approximations ). (iii) The general-purpose procedure for inferring the rank of a matrix from its sample estimate known as the ladle, which we apply to select scatter matrices, see Luo and Li (2016). (iv) The SURE-estimator of Ulfarsson and Solo (2015) which can be seen as the non-robust version of our proposed estimator. ...
... We used 200 bootstrap samples throughout the study, the default value in the implementation in ICtest ). (iii) The ladle estimator of Luo and Li (2016) which, too, can be based on any of the four scatter matrices. The estimator is based on resampling, for which we used the default value 200 in the implementation in ICtest ). ...
Article
Full-text available
The estimation of signal dimension under heavy-tailed latent variable models is studied. As a primary contribution, robust extensions of an earlier estimator based on Gaussian Stein’s unbiased risk estimation are proposed. These novel extensions are based on the framework of elliptical distributions and robust scatter matrices. Extensive simulation studies are conducted in order to compare the novel methods with several well-known competitors in both estimation accuracy and computational speed. The novel methods are applied to a financial asset return data set.
... The resulting Ω 1 is n 1/2 -consistent and asymptotically normal. By Luo and Li (2016), this permits using the ladle estimator on Ω 1 to determine the dimension d of S Y |X , given the identity between S(Ω 1 ) and S Y |X discussed above. When the sample size is limited compared with the dimension p or the number of mixture components q, we also recommend using the predictor augmentation estimator (PAE; Luo and Li, 2021) to determine d, which does not require the asymptotic normality of Ω 1 . ...
... The implementation of SAVE M involves replacing the parameters of the mixture model in Ω 2 with the corresponding estimators in Section 2 and replacing the population moments with the sample moments. The resulting Ω 2 is n 1/2 -consistent and asymptotically normal, by which the ladle estimator (Luo and Li, 2016) can be applied again to determine d. The leading d left singular vectors of Ω 2 then span a n 1/2 -consistent estimator of S Y |X . ...
Preprint
A major family of sufficient dimension reduction (SDR) methods, called inverse regression, commonly require the distribution of the predictor X to have a linear E(XβTX)E(X|\beta^\mathsf{T}X) and a degenerate var(XβTX)\mathrm{var}(X|\beta^\mathsf{T}X) for the desired reduced predictor βTX\beta^\mathsf{T}X. In this paper, we adjust the first and second-order inverse regression methods by modeling E(XβTX)E(X|\beta^\mathsf{T}X) and var(XβTX)\mathrm{var}(X|\beta^\mathsf{T}X) under the mixture model assumption on X, which allows these terms to convey more complex patterns and is most suitable when X has a clustered sample distribution. The proposed SDR methods build a natural path between inverse regression and the localized SDR methods, and in particular inherit the advantages of both; that is, they are n\sqrt{n}-consistent, efficiently implementable, directly adjustable under the high-dimensional settings, and fully recovering the desired reduced predictor. These findings are illustrated by simulation studies and a real data example at the end, which also suggest the effectiveness of the proposed methods for nonclustered data.
... We do not claim objective superiority -just that our method is automatic, moderately scalable, sound in practice, and provides a useful visualisation of the bias/variance trade-off at play in the given data. In particular, we have found the ladle method by [62] to be comparable in all metrics, except computational scalability (as implemented), which has made analysing some of the larger datasets in this paper by this method impossible (e.g. min(n, p) much larger than 1000). ...
... For comparison, we also show the dimensions selected using the ladle [62] and elbow methods [100], as implemented in the R packages 'dimension' (on github: https://github.com/WenlanzZ) and 'igraph' (on The Comprehensive R Archive Network), respectively. ...
Preprint
Complex topological and geometric patterns often appear embedded in high-dimensional data and seem to reflect structure related to the underlying data source, with some distortion. We show that this rich data morphology can be explained by a generic and remarkably simple statistical model, demonstrating that manifold structure in data can emerge from elementary statistical ideas of dependence, correlation and latent variables. The Latent Metric Space model consists of a collection of random fields, evaluated at locations specified by latent variables and observed in noise. Driven by high dimensionality, principal component scores associated with data from this model are uniformly concentrated around a topological manifold, homeomorphic to the latent metric space. Under further assumptions this relation may be a diffeomorphism, a Riemannian metric structure appears, and the geometry of the manifold reflects that of the latent metric space. This provides statistical justification for manifold assumptions which underlie methods ranging from clustering and topological data analysis, to nonlinear dimension reduction, regression and classification, and explains the efficacy of Principal Component Analysis as a preprocessing tool for reduction from high to moderate dimension.
... This balanced slicing strategy helps maintain stable estimation for each slice. To determine the number of eigenvectors to retain in a data-driven manner, we employ the ladle estimator [13], which combines variability in estimated eigenvectors with the contribution of eigenvalues. Let B k represent the first k eigenvectors of a candidate matrix M DR n,mn associated with an SDR method. ...
Article
Full-text available
In high dimensional data analysis, directional regression is a widely used method for implementing linear sufficient dimension reduction by extracting core information from the complex data structure. However, extending sufficient dimension reduction techniques to handle multivariate response data remains a relatively challenging task. In this paper, we propose a novel method that integrates directional regression with the projective resampling framework to tackle the multivariate response regression problem. Our method, called projective resampling directional regression, not only improves estimation accuracy of the dimension reduction subspace but also offers greater flexibility of directional regression across diverse datasets. We establish theoretical properties of our method, including consistency and convergence rates. Comprehensive simulation studies under various scenarios, along with analyses of two real-world datasets, demonstrate the effectiveness and competitiveness of our approach.
... There are several methods to determine the value of , including the variance proportion-based approach [36][37][38], the Bayesian information criterion-based approach [39], the reconstruction error-based approach [26], and the ladle estimator-based approach [40,41]. In this study, we adopt the 99% variance proportionbased approach due to its simplicity and efficiency. ...
Preprint
Full-text available
Surrogate models are extensively employed for forward and inverse uncertainty quantification in complex, computation-intensive engineering problems. Nonetheless, constructing high-accuracy surrogate models for complex dynamical systems with limited training samples continues to be a challenge, as capturing the variability in high-dimensional dynamical system responses with a small training set is inherently difficult. This study introduces an efficient Kriging modeling framework based on functional dimension reduction (KFDR) for conducting forward and inverse uncertainty quantification in dynamical systems. By treating the responses of dynamical systems as functions of time, the proposed KFDR method first projects these responses onto a functional space spanned by a set of predefined basis functions, which can deal with noisy data by adding a roughness regularization term. A few key latent functions are then identified by solving the functional eigenequation, mapping the time-variant responses into a low-dimensional latent functional space. Subsequently, Kriging surrogate models with noise terms are constructed in the latent space. With an inverse mapping established from the latent space to the original output space, the proposed approach enables accurate and efficient predictions for dynamical systems. Finally, the surrogate model derived from KFDR is directly utilized for efficient forward and inverse uncertainty quantification of the dynamical system. Through three numerical examples, the proposed method demonstrates its ability to construct highly accurate surrogate models and perform uncertainty quantification for dynamical systems accurately and efficiently.
... In this work, we revisit the classical problem of estimating the latent dimension in principal component analysis. Numerous solutions to this problem have been proposed in the literature, see, e.g., Luo and Li (2016); Nordhausen et al. (2021); Bernard and Verdebout (2024); Virta et al. (2024) for some recent works. The standard solutions are predominantly based on sequential subsphericity testing, information-theoretic criteria, or risk minimization. ...
Preprint
We propose a modified, high-dimensional version of a recent dimension estimation procedure that determines the dimension via the introduction of augmented noise variables into the data. Our asymptotic results show that the proposal is consistent in wide high-dimensional scenarios, and further shed light on why the original method breaks down when the dimension of either the data or the augmentation becomes too large. Simulations are used to demonstrate the superiority of the proposal to competitors both under and outside of the theoretical model.
... Note that the stochastic variability of the p − k eigenvectors depends strongly on how close together the corresponding eigenvalues are. In a similar context [12], [13] then suggested a kind of dual estimates of q that are based on the bootstrap variation of eigenvector estimates. ...
Preprint
Dimension reduction is often a preliminary step in the analysis of large data sets. The so-called non-Gaussian component analysis searches for a projection onto the non-Gaussian part of the data, and it is then important to know the correct dimension of the non-Gaussian signal subspace. In this paper we develop asymptotic as well as bootstrap tests for the dimension based on the popular fourth order blind identification (FOBI) method.
... Some heuristic rules are, for example, discussed in , Alfons et al. [2024], Radojicic and Nordhausen [2020], but inferential tools are still missing. So far, only Kankainen et al. [2007] propose some tests using COV − COV 4 if all eigenvalues are equal in the Gaussian case (i.e., testing for multivariate normality) and Luo and Li [2016], Radojicic and Nordhausen [2020], in a non-Gaussian component analysis framework for the equality of eigenvalues for components belonging to Gaussian components. Similar tests might be of interest also in model (1). ...
Preprint
Full-text available
Invariant Coordinate Selection (ICS) is a multivariate technique that relies on the simultaneous diagonalization of two scatter matrices. It serves various purposes, including its use as a dimension reduction tool prior to clustering or outlier detection. Unlike methods such as Principal Component Analysis, ICS has a theoretical foundation that explains why and when the identified subspace should contain relevant information. These general results have been examined in detail primarily for specific scatter combinations within a two-cluster framework. In this study, we expand these investigations to include more clusters and scatter combinations. The case of three clusters in particular is studied at length. Based on these expanded theoretical insights and supported by numerical studies, we conclude that ICS is indeed suitable for recovering Fisher's discriminant subspace under very general settings and cases of failure seem rare.
... , X 16 and the rescaled response as Y . Applying the method proposed by Luo and Li (2016), the structural dimension is determined to be 1. In line with the discussion for the Hitter's salary data, we consider two aspects here. ...
Article
Full-text available
Sufficient dimension reduction (SDR) primarily aims to reduce the dimensionality of high-dimensional predictor variables while retaining essential information about the responses. Traditional SDR methods typically employ kernel weighting functions, which unfortunately makes them susceptible to the curse of dimensionality. To address this issue, we in this paper propose novel forest-based approaches for SDR that utilize a locally adaptive kernel generated by Mondrian forests. Overall, our work takes the perspective of Mondrian forest as an adaptive weighted kernel technique for SDR problems. In the central mean subspace model, by integrating the methods from Xia et al. (J R Stat Soc Ser B (Stat Methodol) 64(3):363–410, 2002. https://doi.org/10.1111/1467-9868.03411) with Mondrian forest weights, we suggest the forest-based outer product of gradients estimation (mf-OPG) and the forest-based minimum average variance estimation (mf-MAVE). Moreover, we substitute the kernels used in nonparametric density function estimations (Xia in Ann Stat 35(6):2654–2690, 2007. https://doi.org/10.1214/009053607000000352), targeting the central subspace, with Mondrian forest weights. These techniques are referred to as mf-dOPG and mf-dMAVE, respectively. Under regularity conditions, we establish the asymptotic properties of our forest-based estimators, as well as the convergence of the affiliated algorithms. Through simulation studies and analysis of fully observable data, we demonstrate substantial improvements in computational efficiency and predictive accuracy of our proposals compared with the traditional counterparts.
... Thus, the sample variances of the eigenvalues over all blocks could be used to construct an asymptotic hypothesis test for the null hypothesis that some particular index of sources is noise. Similar strategies have been used for latent dimension estimation in unsupervised dimension reduction of iid data [35,23,36], and second-order source separation [25,44]. Some first steps in this direction in an NSS context are made in [20,41]. ...
... In practice, we need to estimate d in order to construct a basis for S E(Y |X) . An advantage of OPCG is that we can apply the recently developed order determination methods based on eigenvalues and the variation of eigenvectors, such as the Ladle estimator or the Predictor Augmentation method [19,20]. Since M-MADE estimates β directly without use of eigenvalues and eigenvectors, the Ladle plot and Predictor Augmentation methods are not applicable. ...
... The work of JV was supported by Academy of Finland, Grant 335077. variability of eigenvectors [2] or on the combination of the two [3]. Inferential tools based on eigenvalues as well as information theoretic criteria in models assuming Gaussian signal and noise were considered for example in [4]- [6]. ...
... The main idea here is that all Gaussian components have an ICS-eigenvalue of 1, hence the variance among these eigenvalues should be small. Similarly, resampling-based estimates of l for this scatter combination are considered in Luo & Li (2016, 2021. These tests with a joint estimation strategy were extended to a resampling framework for all scatter combinations in an NGCA setting in Radojičić & Nordhausen (2019). ...
Preprint
Full-text available
For high-dimensional data or data with noise variables, tandem clustering is a well-known technique that aims to improve cluster identification by first reducing the dimension. However, the usual approach using principal component analysis (PCA) has been criticized for focusing only on inertia so that the first components do not necessarily retain the structure of interest for clustering. To overcome this drawback, we propose a new tandem clustering approach based on invariant coordinate selection (ICS). By jointly diagonalizing two scatter matrices, ICS is designed to find structure in the data while returning affine invariant components. Some theoretical results have already been derived and guarantee that under some elliptical mixture models, the structure of the data can be highlighted on a subset of the first and/or last components. Nevertheless, ICS has received little attention in a clustering context. Two challenges are the choice of the pair of scatter matrices and the selection of the components to retain. For clustering purposes, we demonstrate that the best scatter pairs consist of one scatter matrix that captures the within-cluster structure and another that captures the global structure. For the former, local shape or pairwise scatters are of great interest, as is the minimum covariance determinant (MCD) estimator based on a carefully selected subset size that is smaller than usual. We evaluate the performance of ICS as a dimension reduction method in terms of preserving the cluster structure present in data. In an extensive simulation study and in empirical applications with benchmark data sets, we compare different combinations of scatter matrices, component selection criteria, and the impact of outliers. Overall, the new approach of tandem clustering with ICS shows promising results and clearly outperforms the approach with PCA.
... Benefiting from the use of the response transformation, the proposed method can work with many types of response, such as continuous, discrete, or categorical data. Our method may also be able to dispose of the order determination problem in many situations which merits further investigation, such as Luo and Li (2016) if the importance of H slices has been determined in advance with a diverging number of H. ...
Preprint
Full-text available
Addressing the simultaneous identification of contributory variables while controlling the false discovery rate (FDR) in high-dimensional data is a crucial statistical challenge. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction framework via a data splitting technique. The variable selection problem is first converted to a least squares procedure with several response transformations. We construct a series of statistics with global symmetry property and leverage the symmetry to derive a data-driven threshold aimed at error rate control. Our approach demonstrates the capability for achieving finite-sample and asymptotic FDR control under mild theoretical conditions. Numerical experiments confirm that our procedure has satisfactory FDR control and higher power compared with existing methods.
... Thus, the sample variances of the eigenvalues over all blocks could be used to construct an asymptotic hypothesis test for the null hypothesis that some particular index of sources is noise. Similar strategies have been used for latent dimension estimation in unsupervised dimension reduction of iid data [34,24,35], and second-order source separation [26,43]. Some first steps in this direction in an NSS context are made in [21,41]. ...
Preprint
Non-stationary source separation is a well-established branch of blind source separation with many different methods. However, for none of these methods large-sample results are available. To bridge this gap, we develop large-sample theory for NSS-JD, a popular method of non-stationary source separation based on the joint diagonalization of block-wise covariance matrices. We work under an instantaneous linear mixing model for independent Gaussian non-stationary source signals together with a very general set of assumptions: besides boundedness conditions, the only assumptions we make are that the sources exhibit finite dependency and that their variance functions differ sufficiently to be asymptotically separable. The consistency of the unmixing estimator and its convergence to a limiting Gaussian distribution at the standard square root rate are shown to hold under the previous conditions. Simulation experiments are used to verify the theoretical results and to study the impact of block length on the separation.
... Recently developed order-determination methods, such as the ladle estimate (Luo and Li, 2016), and predictor augmentation estimator (Luo and Li, 2021) can also be directly applied to estimate d. ...
Preprint
We introduce a novel framework for nonlinear sufficient dimension reduction where both the predictor and the response are distributional data, which are modeled as members of a metric space. Our key step to achieving the nonlinear sufficient dimension reduction is to build universal kernels on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we use the well-known quantile representation of the Wasserstein distance to construct the universal kernel; for multivariate distributions, we resort to the recently developed sliced Wasserstein distance to achieve this purpose. Since the sliced Wasserstein distance can be computed by aggregation of quantile representation of the univariate Wasserstein distance, the computation of multivariate Wasserstein distance is kept at a manageable level. The method is applied to several data sets, including fertility and mortality distribution data and Calgary temperature data.
Article
Fréchet regression has received considerable attention to model metric‐space valued responses that are complex and non‐Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fréchet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fréchet sufficient dimension reduction with graphical structure among high‐dimensional Euclidean predictors. In particular, we propose a convex optimization problem that leverages the graphical information among predictors and avoids inverting the high‐dimensional covariance matrix. We also provide the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the optimization problem. Theoretically, the proposed method achieves subspace estimation and variable selection consistency under suitable conditions. Extensive simulations and a real data analysis are carried out to illustrate the finite‐sample performance of the proposed method.
Article
Full-text available
Ultra high-dimensional datasets, which refer to scenarios where the number of covariates grows at an exponential rate relative to the sample size, are frequently encountered in modern data analysis across fields such as genomics, finance, and social sciences. These datasets pose significant challenges due to the large number of variables relative to the number of observations, potentially resulting in issues such as multicollinearity, overfitting, and computational difficulties. Traditional sufficient dimension reduction (SDR) methods struggle with these challenges, making it necessary to develop new approaches. To address these limitations, we introduce a graphical model-based SDR method that incorporates a smoothly clipped absolute deviation (SCAD) penalty. This method effectively reduces dimensionality while managing sparsity in the dataset. Additionally, we extend directional regression for high-dimensional data by integrating them with graphical LASSO, which enhances the model’s ability to estimate sparse precision matrices. This combined approach not only mitigates computational infeasibility in estimating covariance matrices but also helps avoid overfitting, making it particularly suitable for high-dimensional contexts. Through extensive simulation studies and real-world data analyses, we validate the robustness and effectiveness of our proposed methods. Moreover, we provide a theoretical framework that discusses the convergence rate of these methods, offering insights into their performance under various conditions. Finally, we outline potential avenues for future research, including exploring alternative penalty functions and expanding the applicability of these methods to other types of data structures.
Chapter
In this work, we consider the problem of testing the null hypotheses H0q:λq,V>λq+1,V==λp,V\mathcal {H}_{0q}: \lambda _{q,\mathbf {V}} > \lambda _{q+1,\mathbf {V}}= \ldots = \lambda _{p,\mathbf {V}} where λ1,Vλp,V\lambda _{1,\mathbf {V}} \geq \ldots \geq \lambda _{p,\mathbf {V}} are the ordered eigenvalues of the shape matrix V\mathbf {V} of an elliptical distribution. We propose a class of tests based on signed-rank statistics. Our new tests are constructed (i) to keep the nice properties of the tests introduced in Bernard and Verdebout T (J Multivar Anal, 2023) for the problem, (ii) to improve the detection ability of the same tests in Bernard and Verdebout T (J Multivar Anal, 2023) against alternatives of the form H1q:λq,V=λq+1,V==λp,V\mathcal {H}_{1q} : \lambda _{q,\mathbf {V}} = \lambda _{q+1,\mathbf {V}}=\ldots =\lambda _{p,\mathbf {V}}, and (iii) to improve the robustness to outliers and heavy tails in the data-generating process of the pseudo-Gaussian test proposed in Bernard and Verdebout (Stat Sin, 2024). We show through Monte-Carlo simulations that our new tests achieve these objectives.
Article
Surrogate models have been widely used in the uncertainty propagation and global sensitivity analysis of complex evaluation-expensive engineering problems. However, the construction of high-accuracy surrogate models for time-variant problems with high-dimensional inputs and outputs using a small number of training samples remains a challenge. To address this challenge, we propose a dimension reduction-based Kriging modeling (KMDR) method for high-dimensional time-variant uncertainty propagation and global sensitivity analysis. Singular value decomposition is performed on the original time-variant response to extract principal components from high-dimensional outputs. And the improved sufficient dimension reduction (ISDR) is performed on high-dimensional inputs to identify the latent input space with respect to each principal component of outputs. A ladle estimator with a rigorous mathematical definition is then employed to determine the number of principal components of the outputs and the dimensionalities of the latent input spaces. The ladle estimator considers variabilities in both eigenvalues and eigenvectors of a matrix and can determine the latent dimensionality more accurately and efficiently than existing approaches. Subsequently, Kriging models between high-dimensional inputs and each principal component of the outputs are constructed based on a newly devised Kriging kernel, which embeds the information of the ISDR into the kernel function and can achieve higher accuracy than directly constructing Kriging models between latent inputs and outputs. In addition, a generalized variance-based sensitivity index, which can quantify the effects of the inputs on the overall time-variant response, is defined and computed directly from Sobol’ sensitivity indices of the inputs at each time node. Finally, the surrogate model of the time-variant system constructed by the KMDR is directly adopted for efficient time-variant uncertainty propagation and global sensitivity analysis. Several examples demonstrate that the proposed approach can construct more accurate surrogate models and obtain more accurate time-variant uncertainty propagation and global sensitivity analysis results than existing methods with a small training set.
Article
Since the pioneering work of sliced inverse regression, sufficient dimension reduction has been growing into a mature field in statistics and it has broad applications to regression diagnostics, data visualisation, image processing and machine learning. In this paper, we provide a review of several popular inverse regression methods, including sliced inverse regression (SIR) method and principal hessian directions (PHD) method. In addition, we adopt a conditional characteristic function approach and develop a new class of slicing‐free methods, which are parallel to the classical SIR and PHD, and are named weighted inverse regression ensemble (WIRE) and weighted PHD (WPHD), respectively. Relationship with recently developed martingale difference divergence matrix is also revealed. Numerical studies and a real data example show that the proposed slicing‐free alternatives have superior performance than SIR and PHD.
Article
Full-text available
Background Loyalty card data automatically collected by retailers provide an excellent source for evaluating health-related purchase behavior of customers. The data comprise information on every grocery purchase, including expenditures on product groups and the time of purchase for each customer. Such data where customers have an expenditure value for every product group for each time can be formulated as 3D tensorial data. Objective This study aimed to use the modern tensorial principal component analysis (PCA) method to uncover the characteristics of health-related purchase patterns from loyalty card data. Another aim was to identify card holders with distinct purchase patterns. We also considered the interpretation, advantages, and challenges of tensorial PCA compared with standard PCA. Methods Loyalty card program members from the largest retailer in Finland were invited to participate in this study. Our LoCard data consist of the purchases of 7251 card holders who consented to the use of their data from the year 2016. The purchases were reclassified into 55 product groups and aggregated across 52 weeks. The data were then analyzed using tensorial PCA, allowing us to effectively reduce the time and product group-wise dimensions simultaneously. The augmentation method was used for selecting the suitable number of principal components for the analysis. Results Using tensorial PCA, we were able to systematically search for typical food purchasing patterns across time and product groups as well as detect different purchasing behaviors across groups of card holders. For example, we identified customers who purchased large amounts of meat products and separated them further into groups based on time profiles, that is, customers whose purchases of meat remained stable, increased, or decreased throughout the year or varied between seasons of the year. Conclusions Using tensorial PCA, we can effectively examine customers’ purchasing behavior in more detail than with traditional methods because it can handle time and product group dimensions simultaneously. When interpreting the results, both time and product dimensions must be considered. In further analyses, these time and product groups can be directly associated with additional consumer characteristics such as socioeconomic and demographic predictors of dietary patterns. In addition, they can be linked to external factors that impact grocery purchases such as inflation and unexpected pandemics. This enables us to identify what types of people have specific purchasing patterns, which can help in the development of ways in which consumers can be steered toward making healthier food choices.
Article
Sliced inverse regression (SIR) has propelled sufficient dimension reduction (SDR) into a mature and versatile field with wide‐ranging applications in statistics, including regression diagnostics, data visualisation, image processing and machine learning. However, traditional inverse regression techniques encounter challenges associated with sparsity arising from slicing operations. Weighted inverse regression ensemble (WIRE) presents a novel slicing‐free approach to SDR. In this paper, we establish the asymptotic test theory to determine the dimension as estimated by WIRE. Moreover, we propose a permutation‐based method for determining the order. Extensive numerical studies and real data analysis confirm the excellent performance of the proposed order determination method based on WIRE.
Preprint
Full-text available
We introduce a sufficient graphical model by applying the recently developed nonlinear sufficient dimension reduction techniques to the evaluation of conditional independence. The graphical model is nonparametric in nature, as it does not make distributional assumptions such as the Gaussian or copula Gaussian assumptions. However, unlike a fully nonparametric graphical model, which relies on the high-dimensional kernel to characterize conditional independence, our graphical model is based on conditional independence given a set of sufficient predictors with a substantially reduced dimension. In this way we avoid the curse of dimensionality that comes with a high-dimensional kernel. We develop the population-level properties, convergence rate, and variable selection consistency of our estimate. By simulation comparisons and an analysis of the DREAM 4 Challenge data set, we demonstrate that our method outperforms the existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in the high-dimensional setting.
Article
In this article, we introduce a flexible model-free approach to sufficient dimension reduction analysis using the expectation of conditional difference measure. Without any strict conditions, such as linearity condition or constant covariance condition, the method estimates the central subspace exhaustively and efficiently under linear or nonlinear relationships between response and predictors. The method is especially meaningful when the response is categorical. We also studied the n-consistency and asymptotic normality of the estimate. The efficacy of our method is demonstrated through both simulations and a real data analysis.
Preprint
E3 ligases regulate key processes, but many of their roles remain unknown. Using Perturb-seq, we interrogated the function of 1,130 E3 ligases, partners and substrates in the inflammatory response in primary dendritic cells (DCs). Dozens impacted the balance of DC1, DC2, migratory DC and macrophage states and a gradient of DC maturation. Family members grouped into co-functional modules that were enriched for physical interactions and impacted specific programs through substrate transcription factors. E3s and their adaptors co-regulated the same processes, but partnered with different substrate recognition adaptors to impact distinct aspects of the DC life cycle. Genetic interactions were more prevalent within than between modules, and a deep learning model, comβVAE, predicts the outcome of new combinations by leveraging modularity. The E3 regulatory network was associated with heritable variation and aberrant gene expression in immune cells in human inflammatory diseases. Our study provides a general approach to dissect gene function.
Article
Sufficient dimension reduction reduces the dimension of a regression model without loss of information by replacing the original predictor with its lower-dimensional linear combinations. Partial (sufficient) dimension reduction arises when the predictors naturally fall into two sets X and W, and pursues a partial dimension reduction of X. Though partial dimension reduction is a very general problem, only very few research results are available when W is continuous. To the best of our knowledge, none can deal with the situation where the reduced lower-dimensional subspace of X varies with W. To address such issue, we in this paper propose a novel variable-dependent partial dimension reduction framework and adapt classical sufficient dimension reduction methods into this general paradigm. The asymptotic consistency of our method is investigated. Extensive numerical studies and real data analysis show that our variable-dependent partial dimension reduction method has superior performance compared to the existing methods.
Article
In data analysis using dimension reduction methods, the main goal is to summarize how the response is related to the covariates through a few linear combinations. One key issue is to determine the number of independent, relevant covariate combinations, which is the dimension of the sufficient dimension reduction (SDR) subspace. In this work, we propose an easily-applied approach to conduct inference for the dimension of the SDR subspace, based on augmentation of the covariate set with simulated pseudo-covariates. Applying the partitioning principal to the possible dimensions, we use rigorous sequential testing to select the dimensionality, by comparing the strength of the signal arising from the actual covariates to that appearing to arise from the pseudo-covariates. We show that under a “uniform direction” condition, our approach can be used in conjunction with several popular SDR methods, including sliced inverse regression. In these settings, the test statistic asymptotically follows a beta distribution and therefore is easily calibrated. Moreover, the family-wise type I error rate of our sequential testing is rigorously controlled. Simulation studies and an analysis of newborn anthropometric data demonstrate the robustness of the proposed approach, and indicate that the power is comparable to or greater than the alternatives. This article is protected by copyright. All rights reserved
Article
Sufficient dimension reduction (SDR) is a useful tool for high-dimensional data analysis. SDR aims at reducing the data dimensionality without loss of regression information between the response and its high-dimensional predictors. Many existing SDR methods are designed for the data with continuous responses. Motivated by a recent work on aggregate dimension reduction (Wang in Stat Si 30:1027–1048, 2020), we propose a unified SDR framework for both continuous and binary responses through a structured covariance ensemble. The connection with existing approaches is discussed in details and an efficient algorithm is proposed. Numerical examples and a real data application demonstrate its satisfactory performance.
Article
Sufficient dimension reduction (SDR) methods target finding lower-dimensional representations of a multivariate predictor to preserve all the information about the conditional distribution of the response given the predictor. The reduction is commonly achieved by projecting the predictor onto a low-dimensional subspace. The smallest such subspace is known as the Central Subspace (CS) and is the key parameter of interest for most SDR methods. In this article, we propose a unified and flexible framework for estimating the CS in high dimensions. Our approach generalizes a wide range of model-based and model-free SDR methods to high-dimensional settings, where the CS is assumed to involve only a subset of the predictors. We formulate the problem as a quadratic convex optimization so that the global solution is feasible. The proposed estimation procedure simultaneously achieves the structural dimension selection and coordinate-independent variable selection of the CS. Theoretically, our method achieves dimension selection, variable selection, and subspace estimation consistency at a high convergence rate under mild conditions. We demonstrate the effectiveness and efficiency of our method with extensive simulation studies and real data examples.
Article
In this paper, we propose a criterion based on the variance variation of the sample eigenvalues to correctly estimate the number of significant components in high-dimensional principal component analysis (PCA), and it corresponds to the number of significant eigenvalues of the covariance matrix for p-dimensional variables. Using the random matrix theory, we derive that the consistent properties of the proposed criterion for the situations that the significant eigenvalues tend to infinity, as well as that the bounded significant population eigenvalues. Numerical simulation shows that the probability of estimator is correct by our variance variation criterion converges to 1 is faster than that by criterion of Passemier and Yao [Estimation of the number of spikes, possibly equal, in the high-dimensional case. J. Multivariate Anal., (2014)](PYC), AIC and BIC under the finite fourth moment condition as the dominant population eigenvalues tend to infinity. Moreover, in the case of the maximum eigenvalue bounded, once the gap condition is satisfied, the rate of convergence to 1 is faster than that of PYC and AIC, especially the effect is better than AIC when the sample size is small. It is worth noting that the variance variation criterion significantly improves the accuracy of model selection compared with PYC and AIC when the random variable is a heavy-tailed distribution or finite fourth moment not exists.
Article
Full-text available
After introducing the approach to von Mises derivatives based on compact differentiation due to Reeds (1976), we show how non-parametric maximum likelihood estima- tors can often be defined by solving infinite dimensional score equations. Each component of the score equation corresponds to the derivative of the log likelihood for a one-dimensional para- metric submodel. By means of examples we show that it usually is not possible to base consistency and asymptotic normality theorems on the implicit function theorem. However (in Part II), we show for a particular class of models, that once consistency (in a rather strong sense) has been established by other means, asymptotic normality and efficiency of the non-parametric max- imum likelihood estimator can be established by the von Mises method. As an interlude we illustrate the use of the von Mises method in proving asymptotic correctness of the bootstrap.
Article
Full-text available
When considering the relationships between two sets of variates, the number of nonzero population canonical correlations may be called the dimensionality. In the literature, several tests for dimensionality in the canonical correlation analysis are known. A comparison of seven sequential test procedures is presented, using results from some simulation study. The tests are compared with regard to the relative frequencies of underestimation, correct estimation, and overestimation of the true dimensionality. Some conclusions from the simulation results are drawn.
Article
Full-text available
Sliced inverse regression (SIR) is a renowned dimension reduction method for finding an effective low-dimensional linear subspace. Like many other linear methods, SIR can be extended to nonlinear setting via the "kernel trick". The main purpose of this article is two-fold. We build kernel SIR in a reproducing kernel Hilbert space rigorously for a more intuitive model explanation and theoretical development. The second focus is on the implementation algorithm of kernel SIR for fast computation and numerical stability. We adopt a low rank approximation to approximate the huge and dense full kernel covariance matrix and a reduced singular value decomposition technique for extracting kernel SIR directions. We also explore kernel SIR's ability to combine with other linear learning algorithms for classification and regression including multiresponse regression. Numerical experiments show that kernel SIR is an effective kernel tool for nonlinear dimension reduction and it can easily combine with other linear algorithms to form a powerful toolkit for nonlinear data analysis. Index Terms—dimension reduction, eigenvalue decomposition, kernel, reproducing kernel Hilbert space, singular value decomposition, sliced inverse regression, support vector machines. ✦
Article
Full-text available
Sliced inverse regression is a promising method for the estimation of the central dimension-reduction subspace (CDR space) in semipara- metric regression models. It is particularly useful in tackling cases with high-dimensional covariates. In this article we study the asymptotic behavior of the estimate of the CDR space with high-dimensional covariates, that is, when the dimension of the covariates goes to infin- ity as the sample size goes to infinity. Strong and weak convergence are obtained. We also suggest an estimation procedure of the Bayes information criterion type to ascertain the dimension of the CDR space and derive the consistency. A simulation study is conducted.
Article
Modern advances in computing power have greatly widened scientists' scope in gathering and investigating information from many variables, information which might have been ignored in the past. Yet to effectively scan a large pool of variables is not an easy task, although our ability to interact with data has been much enhanced by recent innovations in dynamic graphics. In this article, we propose a novel data-analytic tool, sliced inverse regression (SIR), for reducing the dimension of the input variable x without going through any parametric or nonparametric model-fitting process. This method explores the simplicity of the inverse view of regression; that is, instead of regressing the univariate output variable y against the multivariate X, we regress x against y. Forward regression and inverse regression are connected by a theorem that motivates this method. The theoretical properties of SIR are investigated under a model of the form, y = f(beta-1x, ..., beta(K)x, epsilon), where the beta-k's are the unknown row vectors. This model looks like a nonlinear regression, except for the crucial difference that the functional form of f is completely unknown. For effectively reducing the dimension, we need only to estimate the space [effective dimension reduction (e.d.r.) space] generated by the beta-k's. This makes our goal different from the usual one in regression analysis, the estimation of all the regression coefficients. In fact, the beta-k's themselves are not identifiable without a specific structural form on f. Our main theorem shows that under a suitable condition, if the distribution of x has been standardized to have the zero mean and the identity covariance, the inverse regression curve, E(x \ y), will fall into the e.d.r. space. Hence a principal component analysis on the covariance matrix for the estimated inverse regression curve can be conducted to locate its main orientation, yielding our estimates for e.d.r. directions. Furthermore, we use a simple step function to estimate the inverse regression curve. No complicated smoothing is needed. SIR can be easily implemented on personal computers. By simulation, we demonstrate how SIR can effectively reduce the dimension of the input variable from, say, 10 to K = 2 for a data set with 400 observations. The spin-plot of y against the two projected variables obtained by SIR is found to mimic the spin-plot of y against the true directions very well. A chi-squared statistic is proposed to address the issue of whether or not a direction found by SIR is spurious.
Article
A general regression problem is one in which a response variable can be expressed as some function of one or more different linear combinations of a set of explanatory variables as well as a random error term. Sliced inverse regression is a method for determining these linear combinations. In this article we address the problem of determining how many linear combinations are involved . Procedures based on conditional means and conditional covariance matrices, as well as a procedure combining the two approaches, are considered. In each case we develop a test that has an asymptotic chi-squared distribution when the vector of explanatory variables is sampled from an elliptically symmetric distribution.
Article
Modem graphical tools have enhanced our ability to learn many things from data directly. With much user-friendly graphical software available, we are encouraged to plot a lot more often than before. The benefits from direct interaction with graphics have been enormous. But trailing behind these high-tech advances is the issue of appropriate guidance on what to plot. There are too many directions to project a high-dimensional data set and unguided plotting can be time-consuming and fruitless. In a recent article, Li set up a statistical framework for study on this issue, based on a notion of effective dimension reduction (edr) directions. They are the directions to project a high dimensional input variable for the purpose of effectively viewing and studying its relationship with an output variable. A methodology, sliced inverse regression, was introduced and shown to be useful in finding edr directions. This article introduces another method for finding edr directions. It begins with the observation that the eigenvectors for the Hessian matrices of the regression function are helpful in the study of the shape of the regression surface. A notation of principal Hessian directions (pHd's) is defined that locates the main axes along which the regression surface shows the largest curvatures in an aggregate sense. We show that pHd's can be used to find edr directions. We further use the celebrated Stein lemma for suggesting estimates. The sampling properties of the estimated pHd's are obtained. A significance test is derived for suggesting the genuineness of a view found by our method. Some versions for implementing this method are discussed, and simulation results and an application to real data are reported. The relationship of this method with exploratory projection pursuit is also discussed.
Article
An asymptotic representation is presented here for differentiable statistical functions which implies a weak central limit theorem for the bootstrap.
Article
This article, which is based on an Interface tutorial, presents an overview of regression graphics, along with an annotated bibliography. The intent is to discuss basic ideas and issues without delving into methodological or theoretical details, and to provide a guide to the literature.
Article
In this paper we introduce a general theory for nonlinear sufficient dimension reduction, and explore its ramifications and scope. This theory subsumes recent work employing reproducing kernel Hilbert spaces, and reveals many parallels between linear and nonlinear sufficient dimension reduction. Using these parallels we analyze the properties of existing methods and develop new ones. We begin by characterizing dimension reduction at the general level of σ\sigma-fields and proceed to that of classes of functions, leading to the notions of sufficient, complete and central dimension reduction classes. We show that, when it exists, the complete and sufficient class coincides with the central class, and can be unbiasedly and exhaustively estimated by a generalized sliced inverse regression estimator (GSIR). When completeness does not hold, this estimator captures only part of the central class. However, in these cases we show that a generalized sliced average variance estimator (GSAVE) can capture a larger portion of the class. Both estimators require no numerical optimization because they can be computed by spectral decomposition of linear operators. Finally, we compare our estimators with existing methods by simulation and on actual data sets.
Article
Sliced inverse regression (SIR) was introduced by Li to find the effective dimension reduction directions for exploring the intrinsic structure of high-dimensional data. In this study, we propose a hybrid SIR method using a kernel machine which we call kernel SIR. The kernel mixtures result in the transformed data distribution being more Gaussian like and symmetric; providing more suitable conditions for performing SIR analysis. The proposed method can be regarded as a nonlinear extension of the SIR algorithm. We provide a theoretical description of the kernel SIR algorithm within the framework of reproducing kernel Hilbert space (RKHS). We also illustrate that kernel SIR performs better than several standard methods for discriminative, visualization, and regression purposes. We show how the features found with kernel SIR can be used for classification of microarray data and several other classification problems and compare the results with those obtained with several existing dimension reduction techniques. The results show that kernel SIR is a powerful nonlinear feature extractor for classification problems.
Book
I. Introduction.- II. Von Mises' Method.- 2.1 Statistical functionals.- 2.2 Von Mises expansions.- 2.3 Freechet derivatives.- III. Hadamard Differentiation.- 3.1 Definitions of differentiability.- 3.2 An implicit function theorem.- IV. Some Probability Theory on C[0,1] and D[0,1].- 4.1 The spaces C[0,1] and D[0,1].- 4.2 Probability theory on C[0,1].- 4.3 Probability theory on D[0,1].- 4.4 Asymptotic Normality.- V. M-, L-, and R-Estimators.- 5.1 M-estimators.- 5.2 L-estimators.- 5.3 R-estimators.- 5.4 Modifications of elements of D[0,1].- VI. Calculus on Function Spaces.- 6.1 Differentiability theorems.- 6.2 An implicit function theorem for statistical functionals.- VII. Applications.- 7.1 M-estimators.- 7.2 L-estimators.- 7.3 R-estimators.- 7.4 Functionals on C[0,1]: sample quantiles.- 7.5 Truncated d.f.'s and modified estimators.- VIII. Asymptotic Efficiency.- 8.1 Asymptotic efficiency and Hadamard differentiability.- 8.2 Asymptotically efficient estimators of location.- References.- List of symbols.
Book
A comprehensive introduction to ICA for students and practitionersIndependent Component Analysis (ICA) is one of the most exciting new topics in fields such as neural networks, advanced statistics, and signal processing. This is the first book to provide a comprehensive introduction to this new technique complete with the fundamental mathematical background needed to understand and utilize it. It offers a general overview of the basics of ICA, important solutions and algorithms, and in-depth coverage of new applications in image processing, telecommunications, audio signal processing, and more.Independent Component Analysis is divided into four sections that cover:* General mathematical concepts utilized in the book* The basic ICA model and its solution* Various extensions of the basic ICA model* Real-world applications for ICA modelsAuthors Hyvarinen, Karhunen, and Oja are well known for their contributions to the development of ICA and here cover all the relevant theory, new algorithms, and applications in various fields. Researchers, students, and practitioners from a variety of disciplines will find this accessible volume both helpful and informative.
Article
In this paper we propose a dimension reduction method for estimating the directions in a multiple-index regression based on information extraction. This extends the recent work of Yin and Cook [X. Yin, R.D. Cook, Direction estimation in single-index regression, Biometrika 92 (2005) 371–384] who introduced the method and used it to estimate the direction in a single-index regression. While a formal extension seems conceptually straightforward, there is a fundamentally new aspect of our extension: We are able to show that, under the assumption of elliptical predictors, the estimation of multiple-index regressions can be decomposed into successive single-index estimation problems. This significantly reduces the computational complexity, because the nonparametric procedure involves only a one-dimensional search at each stage. In addition, we developed a permutation test to assist in estimating the dimension of a multiple-index regression.
Article
The bootstrap, discussed by Efron (1979, 1981), is a powerful tool for the nonparametric estimation of sampling distributions and asymptotic standard errors. We demonstrate consistency of the bootstrap distribution estimates for a general class of robust differentiable statistical functionals. Our conditions for consistency of the bootstrap are variants of previously considered criteria for robustness of the associated statistics. A general example shows that, for almost any location statistic, consistency of the bootstrap variance estimator requires a tail condition on the distribution from which samples are taken. A modification of Efron's estimator of standard error is shown to circumvent this problem.
Article
In canonical correlation analysis the number of nonzero population correlation coefficients is called the dimensionality. Asymptotic distributions of the dimensionalities estimated by Mallows's criterion and Akaike's criterion are given for nonnormal multivariate populations with finite fourth moments. These distributions have a simple form in the case of elliptical populations, and modified criteria are proposed which adjust for nonzero kurtosis. An estimation method based on a marginal likelihood function for the dimensionality is introduced and the asymptotic distribution of the corresponding estimator is derived for multivariate normal populations. It is shown that this estimator is not consistent, but that a simple modification yields consistency. An overall comparison of the various estimation methods is conducted through simulation studies.
Article
We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient dimension reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machine to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, n\sqrt{n}-consistent, and asymptotically normal estimator of the sufficient dimension reduction space. The method is then generalized to nonlinear sufficient dimension reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient dimension reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.
Article
Sufficient Dimension Reduction (SDR) in regression comprises the estimation of the dimension of the smallest (central) dimension reduction subspace and its basis elements. For SDR methods based on a kernel matrix, such as SIR and SAVE, the dimension estimation is equivalent to the estimation of the rank of a random matrix which is the sample based estimate of the kernel. A test for the rank of a random matrix amounts to testing how many of its eigen or singular values are equal to zero. We propose two tests based on the smallest eigen or singular values of the estimated matrix: an asymptotic weighted chi-square test and a Wald-type asymptotic chi-square test. We also provide an asymptotic chi-square test for assessing whether elements of the left singular vectors of the random matrix are zero. These methods together constitute a unified approach for all SDR methods based on a kernel matrix that covers estimation of the central subspace and its dimension, as well as assessment of variable contribution to the lower-dimensional predictor projections with variable selection, a special case. A small power simulation study shows that the proposed and existing tests, specific to each SDR method, perform similarly with respect to power and achievement of the nominal level. Also, the importance of the choice of the number of slices as a tuning parameter is further exhibited.
Article
Efron's "bootstrap" method of distribution approximation is shown to be asymptotically valid in a large number of situations, including t-statistics, the empirical and quantile processes, and von Mises functionals. Some counter-examples are also given, to show that the approximation does not always succeed.
Article
In many situations regression analysis is mostly concerned with inferring about the conditional mean of the response given the predictors, and less concerned with the other aspects of the conditional distribution. In this paper we develop dimension reduction methods that incorporate this consideration. We introduce the notion of the Central Mean Subspace (CMS), a natural inferential object for dimension reduction when the mean function is of interest. We study properties of the CMS, and develop methods to estimate it. These methods include a new class of estimators which requires fewer conditions than pHd, and which displays a clear advantage when one of the conditions for pHd is violated. CMS also reveals a transparent distinction among the existing methods for dimension reduction: OLS, pHd, SIR and SAVE. We apply the new methods to a data set involving recumbent cows.
Article
Dimension reduction in a regression analysis of response y given a p-dimensional vector of predictors x reduces the dimension of x by replacing it with a lower-dimensional linear combination f'x of the x's without specifying a parametric model and without loss of information about the conditional distribution of y given x. We unify three existing methods, sliced inverse regression (SIR), sliced average variance estimate (SAVE), and principal Hessian directions (pHd), into a larger class of methods. Each method estimates a particular candidate matrix, essentially a matrix of parameters. We introduce broad classes of dimension reduction candidate matrices, and we distinguish estimators of the matrices from the matrices themselves. Given these classes of methods and several ways to estimate any matrix, we now have the problem of selecting a particular matrix and estimation method. We propose bootstrap methodology to select among candidate matrices, estimators and dimension, and in particular we investigate linear combinations of different methods.
Article
We introduce directional regression (DR) as a method for dimension reduction. Like contour regression, DR is derived from empirical directions, but achieves higher accuracy and requires substantially less computation. DR naturally synthesizes the dimension reduction estimators based on conditional moments, such as sliced inverse regression and sliced average variance estimation, and in doing so combines the advantages of these methods. Under mild conditions, it provides exhaustive and n-consistent estimate of the dimension reduction space. We develop the asymptotic distribution of the DR estimator, and from that a sequential test procedure to determine the dimension of the central space. We compare the performance of DR with that of existing methods by simulation and find strong evidence of its advantage over a wide range of models. Finally, we apply DR to analyze a data set concerning the identification of hand-written digits.
Article
In this paper we develop some econometric theory for factor models of large dimensions. The focus is the determination of the number of factors (r), which is an unresolved issue in the rapidly growing literature on multifactor models. We first establish the convergence rate for the factor estimates that will allow for consistent estimation of r. We then propose some panel criteria and show that the number of factors can be consistently estimated using the criteria. The theory is developed under the framework of large cross-sections (N) and large time dimensions (T ). No restriction is imposed on the relation between N and T . Simulations show that the proposed criteria have good finite sample properties in many configurations of the panel data encountered in practice. JEL Classification: C13, C33, C43 Keywords: Factor analysis, asset pricing, principal components, model selection. # Email: Jushan.Bai@bc.edu Phone: 617-552-3689 + Email: Serena.Ng@bc.edu Phone: 617-552-2182 We thank two...
Article
The central mean subspace (CMS) and iterative Hessian transformation (IHT) have been introduced recently for dimension reduction when the conditional mean is of interest. Suppose that X is a vector-valued predictor and Y is a scalar response. The basic problem is to find a lower-dimensional predictor \eta^TX such that E(Y|X)=E(Y|\eta^TX). The CMS defines the inferential object for this problem and IHT provides an estimating procedure. Compared with other methods, IHT requires fewer assumptions and has been shown to perform well when the additional assumptions required by those methods fail. In this paper we give an asymptotic analysis of IHT and provide stepwise asymptotic hypothesis tests to determine the dimension of the CMS, as estimated by IHT. Here, the original IHT method has been modified to be invariant under location and scale transformations. To provide empirical support for our asymptotic results, we will present a series of simulation studies. These agree well with the theory. The method is applied to analyze an ozone data set.
Article
We propose a novel approach to sufficient dimension reduction in regression, based on estimating contour directions of small variation in the response. These directions span the orthogonal complement of the minimal space relevant for the regression and can be extracted according to two measures of variation in the response, leading to simple and general contour regression (SCR and GCR) methodology. In comparison with existing sufficient dimension reduction techniques, this contour-based methodology guarantees exhaustive estimation of the central subspace under ellipticity of the predictor distribution and mild additional assumptions, while maintaining \sqrtn-consistency and computational ease. Moreover, it proves robust to departures from ellipticity. We establish population properties for both SCR and GCR, and asymptotic properties for SCR. Simulations to compare performance with that of standard techniques such as ordinary least squares, sliced inverse regression, principal Hessian directions and sliced average variance estimation confirm the advantages anticipated by the theoretical analyses. We demonstrate the use of contour-based methods on a data set concerning soil evaporation.
Principal Component Analysis, Second Edition
  • I T Jolliffe