Hui Zou

Hui Zou
University of Minnesota Twin Cities | UMN · School of Statistics

PhD

About

126
Publications
44,307
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
37,461
Citations
Citations since 2017
42 Research Items
22379 Citations
201720182019202020212022202301,0002,0003,0004,000
201720182019202020212022202301,0002,0003,0004,000
201720182019202020212022202301,0002,0003,0004,000
201720182019202020212022202301,0002,0003,0004,000

Publications

Publications (126)
Preprint
Full-text available
High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model---long the standard method for censored regression in economics---has not been adapted for high-dimensional regression at...
Article
In statistical analysis, researchers often perform coordinatewise Gaussianization such that each variable is marginally normal. The normal score transformation is a method for coordinatewise Gaussianization and is widely used in statistics, econometrics, genetics and other areas. However, few studies exist on the theoretical properties of the norma...
Article
The support vector machine (SVM) is a popular classification method which enjoys good performance in many real applications. The SVM can be viewed as a penalized minimization problem in which the objective function is the expectation of hinge loss function with respect to the standard non-smooth empirical measure corresponding to the true underlyin...
Article
There is a vast amount of work on high dimensional regression. The common starting point for the existing theoretical work is to assume the data generating model is a homoscedastic linear regression model with some sparsity structure. In reality the homoscedasticity assumption is often violated, and hence understanding the heteroscedasticity of the...
Article
Motivated by the Golub-Heath-Wahba formula for ridge regression, we first present a new leave-one-out lemma for the kernel support vector machines (SVM) and related large-margin classifiers. We then use the lemma to design a novel and efficient algorithm, named “magicsvm”, for training the kernel SVM and related large-margin classifiers and computi...
Article
Many machine learning models have tuning parameters to be determined by the training data, and cross‐validation (CV) is perhaps the most commonly used method for selecting tuning parameters. This work concerns the problem of estimating the generalization error of a CV‐tuned predictive model. We propose to use an honest leave‐one‐out cross‐validatio...
Preprint
Huber regression (HR) is a popular robust alternative to the least squares regression when the error follows a heavy-tailed distribution. We propose a new method called the enveloped Huber regression (EHR) by considering the envelope assumption that there exists some subspace of the predictors that has no association with the response, which is ref...
Article
It is very interesting to learn the history of ridge analysis/ridge regression as well as stories of its inventors from Professor Hoerl’s article. The overview article has covered many important aspects of ridge regression, regularization more generally, and their modern applications. Ridge has indeed become an essential concept in data science. My...
Article
Expectile is a generalization of the expected value in probability and statistics. In finance and risk management, the expectile is considered to be an important risk measure due to its connection with gain‐loss ratio and its coherent and elicitable properties. Linear multiple expectile regression was proposed in 1987 for estimating the conditional...
Article
Sparse principal component analysis and sparse canonical correlation analysis are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Because nonsmoothness and nonconvexity bring n...
Article
When estimating coefficients in a linear model, the (sparse) composite quantile regression was first proposed in Zou and Yuan (2008) as an efficient alternative to the (sparse) least squares to handle arbitrary error distribution. The highly nonsmooth nature of the composite loss in the sparse composite quantile regression makes its theoretical ana...
Article
Principal components regression (PCR) is a well‐known method to achieve dimension reduction and often improved prediction over the ordinary least squares. The conventional PCR retains the principal components with large variance and discards those with smaller variance. This operation can easily lead to poor prediction when the response variable is...
Article
Variants of the Lasso or ℓ1-penalized regression have been proposed to accommodate for presence of measurement errors in the covariates. Theoretical guarantees of these estimates have been established for some oracle values of the regularization parameters which are not known in practice. Data-driven tuning such as cross-validation has not been stu...
Preprint
Full-text available
Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvex...
Article
In many econometrics applications, the dataset under investigation spans heterogeneous regimes that are more appropriately modeled using piece-wise components for each of the data segments separated by change-points. We consider using Bayesian high-dimensional shrinkage priors in a change point setting to understand segment-specific relationship be...
Article
Distance Weighted Discrimination (DWD) is an interesting large margin classifier that has been shown to enjoy nice properties and empirical successes. The original DWD only handles binary classification with a linear classification boundary. Multiclass classification problems naturally appear in various fields, such as speech recognition, satellite...
Article
Martingale limit theory is increasingly important in modern probability theory and mathematical statistics. In this article, we give a selected overview of Peter Hall's contributions to both the theoretical foundations and the wide applicability of martingales. We highlight his celebrated coauthored book, Hall and Heyde (1980) and his ground-breaki...
Article
Rapid advances in technology have made classification with high dimensional features and ubiquitous problem in modern scientific studies and applications. There are three fundamental goals in the pursuit of a good high‐dimensional classifier: accuracy, interpretability, and scalability. In the past 15 years, a host of competitive high‐dimensional c...
Article
Principal component analysis (PCA) is a widely used technique for dimension reduction, data processing, and feature extraction. The three tasks are particularly useful and important in high-dimensional data analysis and statistical learning. However, the regular PCA encounters great fundamental challenges under high dimensionality and may produce "...
Article
Sparse penalized quantile regression is a useful tool for variable selection, robust estimation, and heteroscedasticity detection in high-dimensional data analysis. The computational issue of the sparse penalized quantile regression has not yet been fully resolved in the literature, due to nonsmoothness of the quantile regression loss function. We...
Article
Distance-weighted discrimination (DWD) is a modern margin-based classifier with an interesting geometric motivation. It was proposed as a competitor to the support vector machine (SVM). Despite many recent references on DWD, DWD is far less popular than the SVM, mainly because of computational and theoretical reasons. We greatly advance the current...
Article
This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that c...
Article
Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high d...
Article
Full-text available
Tweedie's Compound Poisson model is a popular method to model data with probability mass at zero and non-negative, highly right-skewed distribution. Motivated by wide applications of the Tweedie model in various fields such as actuarial science, we investigate the grouped elastic net method for the Tweedie model in the context of the generalized li...
Article
Consider $n$ independent and identically distributed $p$ -dimensional Gaussian random vectors with covariance matrix Σ. The problem of estimating Σ when $p$ is much larger than $n$ has received a lot of attention in recent years. Yet, little is known about the information criterion for covariance matrix estimation. How to properly define such a cri...
Article
Professors Cai, Ren and Zhou ought to be congratulated for writing such a wonderful expository paper on optimal estimation of highdimensional covariance and precision matrices. Nearly all optimality results on large matrix estimation were established by the authors (and their coauthors). Thus, they are the most appropriate team to write this much n...
Article
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ1 penalization with positive definite constraints for estimating sparse covariance mat...
Article
Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright (2012, AoS) proposed an interesting non-convex modification of the Lasso for doing high-dimensional regression w...
Article
Full-text available
In many applications, the dataset under investigation exhibits heterogeneous regimes that are more appropriately modeled using piece-wise linear models for each of the data segments separated by change-points. Although there have been much work on change point linear regression for the low dimensional case, high-dimensional change point regression...
Article
Full-text available
The Tweedie GLM is a widely used method for predicting insurance premiums. However, the structure of the logarithmic mean is restricted to a linear form in the Tweedie GLM, which can be too rigid for many applications. As a better alternative, we propose a gradient tree-boosting algorithm and apply it to Tweedie compound Poisson models for pure pre...
Article
Full-text available
Distance weighted discrimination (DWD) is a margin-based classifier with an interesting geometric motivation. DWD was originally proposed as a superior alternative to the support vector machine (SVM), however DWD is yet to be popular compared with the SVM. The main reasons are twofold. First, the state-of-the-art algorithm for solving DWD is based...
Article
Full-text available
Expectile, first introduced by Newey and Powell (1987) in the econometrics literature, has recently become increasingly popular in risk management and capital allocation for financial institutions due to its desirable properties such as coherence and elicitability. The current standard tool for expectile regression analysis is the multiple linear e...
Research
Full-text available
The Tweedie GLM is a widely used method for predicting insurance premiums. However, the linear model assumption can be too rigid for many applications. As a better alternative, a boosted Tweedie model is considered in this paper. We propose a TDboost estimator of pure premiums and use a profile likelihood approach to estimate the index and dispersi...
Article
Full-text available
In recent years many sparse linear discriminant analysis methods have been proposed for high-dimensional classification and variable selection. However, most of these proposals focus on binary classification and they are not directly applicable to multiclass classification problems. There are two sparse discriminant analysis methods that can handle...
Article
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis...
Article
Full-text available
Distance weighted discrimination (DWD) was originally proposed to handle the data piling issue in the support vector machine. In this paper, we consider the sparse penalized DWD for high-dimensional classification. The state-of-the-art algorithm for solving the standard DWD is based on second-order cone programming, however such an algorithm does n...
Article
Full-text available
Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819-847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple...
Article
Sufficient dimension reduction (SDR) techniques have proven to be very useful data analysis tools in various applications. Underlying many SDR techniques is a critical assumption that the predictors are elliptically contoured. When this assumption appears to be wrong, practitioners usually try variable transformation such that the transformed predi...
Article
Varying coefficient models have been widely used in longitudinal data analysis, nonlinear time series, survival analysis, and so on. They are natural non-parametric extensions of the classical linear models in many contexts, keeping good interpretability and allowing us to explore the dynamic nature of the model. Recently, penalized estimators have...
Article
Full-text available
This paper concerns a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of t...
Article
Full-text available
Yi and Zou (2013) proposed a Stein's unbiased risk estimator (SURE) for the tapering covariance estimator and suggested using the minimizer of SURE as the chosen tapering parameter. Motivated by the deep connection between SURE and AIC in regression models, we propose a family of generalized SURE (gSURE) indexed by c where c is the same constant in...
Article
Koenker (1993) discovered an interesting distribution whose α quantile and α expectile coincide for every α in (0,1)(0,1). We analytically characterize the distribution whose ω(α)ω(α) expectile and α quantile coincide, where ω(·)ω(·) can be any monotone function. We further apply the general theory to derive generalized Koenker's distributions corr...
Article
Graphical models are commonly used tools for modeling multivariate random variables. While there exist many convenient multivariate distributions such as Gaussian distribution for continuous data, mixed data with the presence of discrete variables or a combination of both continuous and discrete variables poses new challenges in statistical modelin...
Article
A new model-free screening method named fused Kolmogorov filter is proposed for high-dimensional data analysis. This new method is fully nonparametric and can work with many types of covariates and response variables, including continuous, discrete and categorical variables. We apply the fused Kolmogorov filter to deal with variable screening probl...
Article
We introduce a constrained empirical loss minimization framework for estimating high-dimensional sparse precision matrices and propose a new loss function, called the D-trace loss, for that purpose. A novel sparse precision matrix estimator is defined as the minimizer of the lasso penalized D-trace loss under a positive-definiteness constraint. Und...
Article
The non-paranormal model assumes that the variables follow a multivariate normal distribution after a set of unknown monotone increasing transformations. It is a flexible generalization of the normal model but retains the nice interpretability of the latter. We propose a rank-based tapering estimator for estimating the correlation matrix in the non...
Article
Statistical inference of semiparametric Gaussian copulas is well studied in the classical fixed dimension and large sample size setting. Nevertheless, optimal estimation of the correlation matrix of semiparametric Gaussian copula is understudied, especially when the dimension can far exceed the sample size. In this paper we derive the minimax rate...
Article
Quantile regression provides a more thorough view of the effect of covariates on a response. Non-parametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, as important variables can influence various quantiles in different ways...
Article
In the last decade, the demand for statistical and computation methods for data analysis that involve sparse matrices has grown dramatically. The main reason for this is that the classical approaches produce solutions in a form of linear combinations of all variables involved in the problem. However, the nowadays applications deal with huge data se...
Article
In this article, we reveal the connection between and equivalence of three sparse linear discriminant analysis methods: the l1-Fisher’s discriminant analysis proposed by Wu et al. in 2008, the sparse optimal scoring proposed by Clemmensen et al. in 2011, and the direct sparse discriminant analysis (DSDA) proposed by Mai et al. in 2012. It is shown...
Article
Chandrasekaran, Parrilo, and Willsky (2012) proposed a convex optimization problem for graphical model selection in the presence of unobserved variables. This convex optimization problem aims to estimate an inverse covariance matrix that can be decomposed into a sparse matrix minus a low-rank matrix from sample data. Solving this convex optimizatio...
Article
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten & Tibshirani 2011, Cai & Liu 2011, Mai et al. 2012, Fan et al. 2012). In this paper, we develop high-dimensional semiparametric sparse discriminant analysis (HD-SeSDA)...
Article
Cai et al. (2010) [4] have studied the minimax optimal estimation of a collection of large bandable covariance matrices whose off-diagonal entries decay to zero at a polynomial rate. They have shown that the minimax optimal procedures are fundamentally different under Frobenius and spectral norms, regardless of the rate of polynomial decay. To gain...
Article
Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems, including t-test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods rely on strong modelling assumptions that are easily violated in real applications. To c...
Article
A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes high-dimensional precision matrix estimation very appealing in many applications. However, in practice we often face nonnormal data, and variable transformation i...
Article
Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering param...
Article
Full-text available
We introduce a cocktail algorithm, a good mixture of coordinate decent, the majorization-minimization principle and the strong rule, for computing the solution paths of the elastic net penalized Cox's proportional hazards model. The cocktail algorithm enjoys a proven convergence property. We have implemented the cocktail algorithm in an R package f...
Article
Full-text available
The thresholding covariance estimator has nice asymptotic properties for estimating sparse large covariance matrices, but it often has negative eigenvalues when used in real data analysis. To fix this drawback of thresholding estimation, we develop a positive-definite ℓ 1 -penalized covariance estimator for estimating sparse large covariance matric...
Article
Full-text available
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it...
Article
Full-text available
The Ising model is a useful tool for studying complex interactions within a system. The estimation of such a model, however, is rather challenging, especially in the presence of high-dimensional parameters. In this work, we propose efficient procedures for learning a sparse Ising model based on a penalized composite conditional likelihood with nonc...
Article
Full-text available
The glmnet package by Friedman et al. [Regularization paths for generalized linear models via coordinate descent, J. Statist. Softw. 33 (2010), pp. 1–22] is an extremely fast implementation of the standard coordinate descent algorithm for solving ℓ1 penalized learning problems. In this paper, we consider a family of coordinate majorization descent...
Article
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these r...
Article
Full-text available
The hybrid Huberized support vector machine (HHSVM) has proved its advantages over the ℓ1 support vector machine (SVM) in terms of classification and variable selection. Similar to the ℓ1 SVM, the HHSVM enjoys a piecewise linear path property and can be computed by a LARS-type piecewise linear solution path algorithm. In this paper we propose a gen...
Article
Full-text available
Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness struct...
Article
Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficu...
Article
Full-text available
The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonp...
Article
Full-text available
In linear regression problems with related predictors, it is desirable to do variable selection and estimation by maintaining the hierarchical or structural relationships among predictors. In this paper we propose non-negative garrote methods that can naturally incorporate such relationships defined through effect heredity principles or marginality...
Article
Full-text available
Factor analysis is a popular multivariate analysis method which is used to describe observed variables as linear combinations of hidden factors. In applications one usually needs to rotate the estimated factor loading matrix in order to obtain a more understandable model. In this article, an ℓ 1 penalization method is introduced for performing spar...
Article
Local polynomial regression is a useful non-parametric regression tool to explore fine data structures and has been widely used in practice. We propose a new non-parametric regression technique called local composite quantile regression smoothing to improve local polynomial regression further. Sampling properties of the estimation procedure propose...
Article
Full-text available
Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called local composite-quantile-regression (CQR) smoothing in order to further improve local polynomial regression. Sampling properties of the...
Article
Full-text available
We consider efficient construction of nonlinear solution paths for general ℓ 1 -regularization. Unlike the existing methods that incrementally build the solution path through a combination of local linear approximation and recalibration, we propose an efficient global approximation to the whole solution path. With the loss function approximated by...
Article
Full-text available
We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property (Fan and Li, 2001; Fan and Peng, 2004) which ensures the optimal large sample performance. Furthermore, the high-dimensionality often induce...
Article
Full-text available
Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. W...
Article
Full-text available
We would like to take this opportunity to thank the discussants for their thoughtful comments and encouragements [ P. Bühlmann and L. Meier , ibid. 36, No. 4, 1534–1541 (2008; Zbl 1282.62096 ); X.-L. Meng , ibid. 36, No. 4, 1542–1552 (2008; Zbl 1282.62104 ); C.-H. Zhang , ibid. 36, No. 4, 1553–1560 (2008; Zbl 1282.62110 )] on our work [ibid. 36, No...
Article
Fan & Li (2001) propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In t...

Network

Cited By