## About

126

Publications

44,307

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

37,461

Citations

Citations since 2017

Introduction

**Skills and Expertise**

## Publications

Publications (126)

High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model---long the standard method for censored regression in economics---has not been adapted for high-dimensional regression at...

In statistical analysis, researchers often perform coordinatewise Gaussianization such that each variable is marginally normal. The normal score transformation is a method for coordinatewise Gaussianization and is widely used in statistics, econometrics, genetics and other areas. However, few studies exist on the theoretical properties of the norma...

The support vector machine (SVM) is a popular classification method which enjoys good performance in many real applications. The SVM can be viewed as a penalized minimization problem in which the objective function is the expectation of hinge loss function with respect to the standard non-smooth empirical measure corresponding to the true underlyin...

There is a vast amount of work on high dimensional regression. The common starting point for the existing theoretical work is to assume the data generating model is a homoscedastic linear regression model with some sparsity structure. In reality the homoscedasticity assumption is often violated, and hence understanding the heteroscedasticity of the...

Motivated by the Golub-Heath-Wahba formula for ridge regression, we first present a new leave-one-out lemma for the kernel support vector machines (SVM) and related large-margin classifiers. We then use the lemma to design a novel and efficient algorithm, named “magicsvm”, for training the kernel SVM and related large-margin classifiers and computi...

Many machine learning models have tuning parameters to be determined by the training data, and cross‐validation (CV) is perhaps the most commonly used method for selecting tuning parameters. This work concerns the problem of estimating the generalization error of a CV‐tuned predictive model. We propose to use an honest leave‐one‐out cross‐validatio...

Huber regression (HR) is a popular robust alternative to the least squares regression when the error follows a heavy-tailed distribution. We propose a new method called the enveloped Huber regression (EHR) by considering the envelope assumption that there exists some subspace of the predictors that has no association with the response, which is ref...

It is very interesting to learn the history of ridge analysis/ridge regression as well as stories of its inventors from Professor Hoerl’s article. The overview article has covered many important aspects of ridge regression, regularization more generally, and their modern applications. Ridge has indeed become an essential concept in data science. My...

Expectile is a generalization of the expected value in probability and statistics. In finance and risk management, the expectile is considered to be an important risk measure due to its connection with gain‐loss ratio and its coherent and elicitable properties. Linear multiple expectile regression was proposed in 1987 for estimating the conditional...

Sparse principal component analysis and sparse canonical correlation analysis are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Because nonsmoothness and nonconvexity bring n...

When estimating coefficients in a linear model, the (sparse) composite quantile regression was first proposed in Zou and Yuan (2008) as an efficient alternative to the (sparse) least squares to handle arbitrary error distribution. The highly nonsmooth nature of the composite loss in the sparse composite quantile regression makes its theoretical ana...

Principal components regression (PCR) is a well‐known method to achieve dimension reduction and often improved prediction over the ordinary least squares. The conventional PCR retains the principal components with large variance and discards those with smaller variance. This operation can easily lead to poor prediction when the response variable is...

Variants of the Lasso or ℓ1-penalized regression have been proposed to accommodate for presence of measurement errors in the covariates. Theoretical guarantees of these estimates have been established for some oracle values of the regularization parameters which are not known in practice. Data-driven tuning such as cross-validation has not been stu...

Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvex...

In many econometrics applications, the dataset under investigation spans heterogeneous regimes that are more appropriately modeled using piece-wise components for each of the data segments separated by change-points. We consider using Bayesian high-dimensional shrinkage priors in a change point setting to understand segment-specific relationship be...

Distance Weighted Discrimination (DWD) is an interesting large margin classifier that has been shown to enjoy nice properties and empirical successes. The original DWD only handles binary classification with a linear classification boundary. Multiclass classification problems naturally appear in various fields, such as speech recognition, satellite...

Martingale limit theory is increasingly important in modern probability theory and mathematical statistics. In this article, we give a selected overview of Peter Hall's contributions to both the theoretical foundations and the wide applicability of martingales. We highlight his celebrated coauthored book, Hall and Heyde (1980) and his ground-breaki...

Rapid advances in technology have made classification with high dimensional features and ubiquitous problem in modern scientific studies and applications. There are three fundamental goals in the pursuit of a good high‐dimensional classifier: accuracy, interpretability, and scalability. In the past 15 years, a host of competitive high‐dimensional c...

Principal component analysis (PCA) is a widely used technique for dimension reduction, data processing, and feature extraction. The three tasks are particularly useful and important in high-dimensional data analysis and statistical learning. However, the regular PCA encounters great fundamental challenges under high dimensionality and may produce "...

Sparse penalized quantile regression is a useful tool for variable selection, robust estimation, and heteroscedasticity detection in high-dimensional data analysis. The computational issue of the sparse penalized quantile regression has not yet been fully resolved in the literature, due to nonsmoothness of the quantile regression loss function. We...

Distance-weighted discrimination (DWD) is a modern margin-based classifier with an interesting geometric motivation. It was proposed as a competitor to the support vector machine (SVM). Despite many recent references on DWD, DWD is far less popular than the SVM, mainly because of computational and theoretical reasons. We greatly advance the current...

This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that c...

Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high d...

Tweedie's Compound Poisson model is a popular method to model data with probability mass at zero and non-negative, highly right-skewed distribution. Motivated by wide applications of the Tweedie model in various fields such as actuarial science, we investigate the grouped elastic net method for the Tweedie model in the context of the generalized li...

Consider $n$ independent and identically distributed $p$ -dimensional Gaussian random vectors with covariance matrix Σ. The problem of estimating Σ when $p$ is much larger than $n$ has received a lot of attention in recent years. Yet, little is known about the information criterion for covariance matrix estimation. How to properly define such a cri...

Professors Cai, Ren and Zhou ought to be congratulated for writing such a wonderful expository paper on optimal estimation of highdimensional covariance and precision matrices. Nearly all optimality results on large matrix estimation were established by the authors (and their coauthors). Thus, they are the most appropriate team to write this much n...

We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ1 penalization with positive definite constraints for estimating sparse covariance mat...

Much theoretical and applied work has been devoted to high-dimensional
regression with clean data. However, we often face corrupted data in many
applications where missing data and measurement errors cannot be ignored. Loh
and Wainwright (2012, AoS) proposed an interesting non-convex modification of
the Lasso for doing high-dimensional regression w...

In many applications, the dataset under investigation exhibits heterogeneous
regimes that are more appropriately modeled using piece-wise linear models for
each of the data segments separated by change-points. Although there have been
much work on change point linear regression for the low dimensional case,
high-dimensional change point regression...

The Tweedie GLM is a widely used method for predicting insurance premiums. However, the structure of the logarithmic mean is restricted to a linear form in the Tweedie GLM, which can be too rigid for many applications. As a better alternative, we propose a gradient tree-boosting algorithm and apply it to Tweedie compound Poisson models for pure pre...

Distance weighted discrimination (DWD) is a margin-based classifier with an
interesting geometric motivation. DWD was originally proposed as a superior
alternative to the support vector machine (SVM), however DWD is yet to be
popular compared with the SVM. The main reasons are twofold. First, the
state-of-the-art algorithm for solving DWD is based...

Expectile, first introduced by Newey and Powell (1987) in the econometrics
literature, has recently become increasingly popular in risk management and
capital allocation for financial institutions due to its desirable properties
such as coherence and elicitability. The current standard tool for expectile
regression analysis is the multiple linear e...

The Tweedie GLM is a widely used method for predicting insurance premiums. However, the linear model assumption can be too rigid for many applications. As a better alternative, a boosted Tweedie model is considered in this paper. We propose a TDboost estimator of pure premiums and use a profile likelihood approach to estimate the index and dispersi...

In recent years many sparse linear discriminant analysis methods have been
proposed for high-dimensional classification and variable selection. However,
most of these proposals focus on binary classification and they are not
directly applicable to multiclass classification problems. There are two sparse
discriminant analysis methods that can handle...

In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis...

Distance weighted discrimination (DWD) was originally proposed to handle the
data piling issue in the support vector machine. In this paper, we consider the
sparse penalized DWD for high-dimensional classification. The state-of-the-art
algorithm for solving the standard DWD is based on second-order cone
programming, however such an algorithm does n...

Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819-847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple...

Sufficient dimension reduction (SDR) techniques have proven to be very useful data analysis tools in various applications. Underlying many SDR techniques is a critical assumption that the predictors are elliptically contoured. When this assumption appears to be wrong, practitioners usually try variable transformation such that the transformed predi...

Varying coefficient models have been widely used in longitudinal data analysis, nonlinear time series, survival analysis, and so on. They are natural non-parametric extensions of the classical linear models in many contexts, keeping good interpretability and allowing us to explore the dynamic nature of the model. Recently, penalized estimators have...

This paper concerns a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of t...

Yi and Zou (2013) proposed a Stein's unbiased risk estimator (SURE) for the
tapering covariance estimator and suggested using the minimizer of SURE as the
chosen tapering parameter. Motivated by the deep connection between SURE and
AIC in regression models, we propose a family of generalized SURE (gSURE)
indexed by c where c is the same constant in...

Koenker (1993) discovered an interesting distribution whose α quantile and α expectile coincide for every α in (0,1)(0,1). We analytically characterize the distribution whose ω(α)ω(α) expectile and α quantile coincide, where ω(·)ω(·) can be any monotone function. We further apply the general theory to derive generalized Koenker's distributions corr...

Graphical models are commonly used tools for modeling multivariate random
variables. While there exist many convenient multivariate distributions such as
Gaussian distribution for continuous data, mixed data with the presence of
discrete variables or a combination of both continuous and discrete variables
poses new challenges in statistical modelin...

A new model-free screening method named fused Kolmogorov filter is proposed
for high-dimensional data analysis. This new method is fully nonparametric and
can work with many types of covariates and response variables, including
continuous, discrete and categorical variables. We apply the fused Kolmogorov
filter to deal with variable screening probl...

We introduce a constrained empirical loss minimization framework for estimating high-dimensional sparse precision matrices
and propose a new loss function, called the D-trace loss, for that purpose. A novel sparse precision matrix estimator is defined
as the minimizer of the lasso penalized D-trace loss under a positive-definiteness constraint. Und...

The non-paranormal model assumes that the variables follow a multivariate normal distribution after a set of unknown monotone increasing transformations. It is a flexible generalization of the normal model but retains the nice interpretability of the latter. We propose a rank-based tapering estimator for estimating the correlation matrix in the non...

Statistical inference of semiparametric Gaussian copulas is well studied in the classical fixed dimension and large sample size setting. Nevertheless, optimal estimation of the correlation matrix of semiparametric Gaussian copula is understudied, especially when the dimension can far exceed the sample size. In this paper we derive the minimax rate...

Quantile regression provides a more thorough view of the effect of covariates on a response. Non-parametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, as important variables can influence various quantiles in different ways...

In the last decade, the demand for statistical and computation methods for data analysis that involve sparse matrices has grown dramatically. The main reason for this is that the classical approaches produce solutions in a form of linear combinations of all variables involved in the problem. However, the nowadays applications deal with huge data se...

In this article, we reveal the connection between and equivalence of three sparse linear discriminant analysis methods: the l1-Fisher’s discriminant analysis proposed by Wu et al. in 2008, the sparse optimal scoring proposed by Clemmensen et al. in 2011, and the direct sparse discriminant analysis (DSDA) proposed by Mai et al. in 2012. It is shown...

Chandrasekaran, Parrilo, and Willsky (2012) proposed a convex optimization problem for graphical model selection in the presence of unobserved variables. This convex optimization problem aims to estimate an inverse covariance matrix that can be decomposed into a sparse matrix minus a low-rank matrix from sample data. Solving this convex optimizatio...

In recent years, a considerable amount of work has been devoted to
generalizing linear discriminant analysis to overcome its incompetence for
high-dimensional classification (Witten & Tibshirani 2011, Cai & Liu 2011, Mai
et al. 2012, Fan et al. 2012). In this paper, we develop high-dimensional
semiparametric sparse discriminant analysis (HD-SeSDA)...

Cai et al. (2010) [4] have studied the minimax optimal estimation of a collection of large bandable covariance matrices whose off-diagonal entries decay to zero at a polynomial rate. They have shown that the minimax optimal procedures are fundamentally different under Frobenius and spectral norms, regardless of the rate of polynomial decay. To gain...

Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems,
including t-test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods
rely on strong modelling assumptions that are easily violated in real applications. To c...

A sparse precision matrix can be directly translated into a sparse Gaussian
graphical model under the assumption that the data follow a joint normal
distribution. This neat property makes high-dimensional precision matrix
estimation very appealing in many applications. However, in practice we often
face nonnormal data, and variable transformation i...

Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering param...

We introduce a cocktail algorithm, a good mixture of coordinate decent, the majorization-minimization principle and the strong rule, for computing the solution paths of the elastic net penalized Cox's proportional hazards model. The cocktail algorithm enjoys a proven convergence property. We have implemented the cocktail algorithm in an R package f...

The thresholding covariance estimator has nice asymptotic properties for estimating sparse large covariance matrices, but it often has negative eigenvalues when used in real data analysis. To fix this drawback of thresholding estimation, we develop a positive-definite ℓ 1 -penalized covariance estimator for estimating sparse large covariance matric...

Folded concave penalization methods have been shown to enjoy the strong
oracle property for high-dimensional sparse estimation. However, a folded
concave penalization problem usually has multiple local solutions and the
oracle property is established only for one of the unknown local solutions. A
challenging fundamental issue still remains that it...

The Ising model is a useful tool for studying complex interactions within a
system. The estimation of such a model, however, is rather challenging,
especially in the presence of high-dimensional parameters. In this work, we
propose efficient procedures for learning a sparse Ising model based on a
penalized composite conditional likelihood with nonc...

The glmnet package by Friedman et al. [Regularization paths for generalized linear models via coordinate descent, J. Statist. Softw. 33 (2010), pp. 1–22] is an extremely fast implementation of the standard coordinate descent algorithm for solving ℓ1 penalized learning problems. In this paper, we consider a family of coordinate majorization descent...

Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these r...

The hybrid Huberized support vector machine (HHSVM) has proved its advantages over the ℓ1 support vector machine (SVM) in terms of classification and variable selection. Similar to the ℓ1 SVM, the HHSVM enjoys a piecewise linear path property and can be computed by a LARS-type piecewise linear solution path algorithm. In this paper we propose a gen...

Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness struct...

Finite gaussian mixture models are widely used in statistics thanks to their great flexibility. However, parameter estimation for gaussian mixture models with high dimensionality can be challenging because of the large number of parameters that need to be estimated. In this letter, we propose a penalized likelihood estimator to address this difficu...

The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonp...

In linear regression problems with related predictors, it is desirable to do variable selection and estimation by maintaining the hierarchical or structural relationships among predictors. In this paper we propose non-negative garrote methods that can naturally incorporate such relationships defined through effect heredity principles or marginality...

Factor analysis is a popular multivariate analysis method which is used to describe observed variables as linear combinations of hidden factors. In applications one usually needs to rotate the estimated factor loading matrix in order to obtain a more understandable model. In this article, an ℓ 1 penalization method is introduced for performing spar...

Local polynomial regression is a useful non-parametric regression tool to explore fine data structures and has been widely used in practice. We propose a new non-parametric regression technique called local composite quantile regression smoothing to improve local polynomial regression further. Sampling properties of the estimation procedure propose...

Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called local composite-quantile-regression (CQR) smoothing in order to further improve local polynomial regression. Sampling properties of the...

We consider efficient construction of nonlinear solution paths for general ℓ 1 -regularization. Unlike the existing methods that incrementally build the solution path through a combination of local linear approximation and recalibration, we propose an efficient global approximation to the whole solution path. With the loss function approximated by...

We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property (Fan and Li, 2001; Fan and Peng, 2004) which ensures the optimal large sample performance. Furthermore, the high-dimensionality often induce...

Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. W...

We would like to take this opportunity to thank the discussants for their thoughtful comments and encouragements [ P. Bühlmann and L. Meier , ibid. 36, No. 4, 1534–1541 (2008; Zbl 1282.62096 ); X.-L. Meng , ibid. 36, No. 4, 1542–1552 (2008; Zbl 1282.62104 ); C.-H. Zhang , ibid. 36, No. 4, 1553–1560 (2008; Zbl 1282.62110 )] on our work [ibid. 36, No...

Fan & Li (2001) propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In t...