
Lijian YangTsinghua University | TH · Department of Industrial Engineering Center for Statistical Science
Lijian Yang
Ph.D. (1995), UNC-Chapel Hill; B.S. (1987), Peking University
functional moving average (FMA) and functional panel data, e.g., EEG; functional data with dependent error, e.g., ERP
About
127
Publications
19,424
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,567
Citations
Citations since 2017
Introduction
data: time series, functional, high dimensional, sample survey; theory: simultaneous confidence region & oracle efficiency; applications: econometrics, genetics, agronomy, food science, brain science; honors: ASA Fellow, IMS Fellow, ISI Elected Member, IETI Distinguished Fellow, Tjalling C. Koopmans Econometric Theory Prize; Ph.D.'s: Michigan State University 7, Soochow University 3, Tsinghua University 3: 3 associate & 3 full professors in US, 3 associate professors & 3 lecturers in China
Additional affiliations
Education
August 1990 - December 1995
August 1990 - August 1993
September 1983 - June 1987
Publications
Publications (127)
We propose kernel estimator for the distribution function of unobserved
errors in autoregressive time series, based on residuals computed by estimating
the autoregressive coefficients with the Yule-Walker method. Under mild
assumptions, we establish oracle efficiency of the proposed estimator, that is,
it is asymptotically as efficient as the kerne...
Functional data analysis (FDA) has become an important area of statistics research in the recent decade, yet a smooth simultaneous confidence corridor (SCC) does not exist in the literature for the mean function of sparse functional data. SCC is a powerful tool for making statistical inference on an entire unknown function, nonetheless classic “Hun...
Most time series that are encountered in practice contain non-zero trend, yet textbook approaches to time series analysis are typically focused on zero-mean stationary auto-regressive moving average (ARMA) processes.Trend is often estimated by ad hoc methods and subtracted from time series, and the residuals are used as the true ARMA noise for data...
Humboldt-Universität zu Berlin Generalized additive models (GAM) are multivariate nonpara-metric regressions for non-Gaussian responses including binary and count data. We propose a spline-backfitted kernel (SBK) estimator for the component functions. Our results are for weakly dependent data and we prove oracle efficiency. The SBK techniques is bo...
Application of nonparametric and semiparametric regression techniques to high-dimensional time series data has been hampered due to the lack of effective tools to address the ``curse of dimensionality.'' Under rather weak conditions, we propose spline-backfitted kernel estimators of the component functions for the nonlinear additive time series dat...
A key step to the establishment of a tiered healthcare system is equitable access to basic primary healthcare services for all. However, no quantitative research on the national status quo of primary healthcare accessibility in China exists. We filled this gap by estimating spatial accessibility to primary healthcare centers (PHCs) and mapping its...
We investigate statistical inference for the mean function of stationary functional time series data with an infinite moving average structure. We propose a B-spline estimation for the temporally ordered trajectories of the functional moving average, which are used to construct a two-step estimator of the mean function. Under mild conditions, the B...
Claims about distributions of time series are often unproven assertions instead of substantiated conclusions for lack of hypotheses testing tools. In this work, Kolmogorov–Smirnov type simultaneous confidence bands (SCBs) are constructed based on simple random samples (SRSs) drawn from realizations of time series, together with smooth SCBs using ke...
Estimation and testing is studied for functional data with temporally dependent errors, an interesting example of which is the event-related potential (ERP). B-spline estimators are formulated for individual smooth trajectories and their population mean as well. The mean estimator is shown to be oracally efficient in the sense that it is as efficie...
Maximum likelihood estimator (MLE) and Bayesian Information Criterion (BIC) order selection are examined for ARMA time series with slowly varying trend to validate the well-known detrending technique of moving average [Section 1.4, Brockwell, P.J., and Davis, R.A. (1991), Time Series: Theory and Methods, New York: Springer-Verlag]. In step one, a m...
Statistical inference for functional time series is investigated by extending the classic concept of autocovariance function (ACF) to functional ACF (FACF). It is established that for functional moving average (FMA) data, the FMA order can be determined as the highest nonvanishing order of FACF, just as in classic time series analysis. A two-step e...
Kolmogorov-Smirnov (K-S) simultaneous confidence band (SCB) is constructed for the error distribution of dense functional data based on kernel distribution estimator (KDE). The KDE is computed from residuals of B spline trajectories over a smaller number of measurements, whereas the B spline trajectories are computed from the remaining larger set o...
This study aims to explore the possibility of predicting the dispositional level of dialectical thinking using resting-state electroencephalography signals. Thirty-four participants completed a self-reported measure of dialectical thinking, and their resting-state electroencephalography was recorded. After wave filtration and eye movement removal,...
Motivated by recent data analyses in biomedical imaging studies, we consider a class of image-on-scalar regression models for imaging responses and scalar predictors. We propose using flexible multivariate splines over triangulations to handle the irregular domain of the objects of interest on the images, as well as other characteristics of images....
Asymptotically correct simultaneous confidence bands (SCBs) are proposed in both multiplicative and additive form to compare variance functions of two samples in the nonparametric regression model based on deterministic designs. The multiplicative SCB is based on two-step estimation of ratio of the variance functions, which is as efficient, up to o...
A smooth simultaneous confidence band (SCB) is constructed for the distribution of unobserved errors in a nonparametric regression model based on a plug-in kernel distribution estimator. The normalized estimation error process is shown to converge to a Gaussian process. Simulation experiments indicate that the proposed SCB not only strikes an intel...
This study aims to explore the possibility of predicting the dispositional level of dialectical thinking using resting-state electroencephalography signals. Thirty-four participants completed a self-reported measure of dialectical thinking, and their resting-state electroencephalography was recorded. After wave filtration and eye signal removal, ti...
We consider the estimation of the boundary of a set when it is known to be sufficiently smooth, to satisfy certain shape constraints and to have an additive structure. Our proposed method is based on spline estimation of a conditional quantile regression and is resistant to outliers and/or extreme values in the data. This work is a desirable extens...
This paper concerns the comparison of two sample non parametric regression. An asymptotically correct simultaneous confidence band (SCB) is proposed for the difference of two-sample non parametric regression functions to achieve the goal of comparison. Simulation experiments provide strong evidence that corroborates the asymptotic theory. The propo...
The popularity of a fashion item depends on its color, shape, texture, and price. For different items (with all attributes identical except color) of a specific product, fashion retailers need to learn consumer color preference and decide their order quantities accordingly to match their products with consumer demand. This study aims to predict con...
Production frontier is an important concept in modern economics and has been widely used to measure production efficiency. Existing nonparametric frontier models often only allow one or low-dimensional input variables due to ‘curse-of-dimensionality’. In this paper we propose a flexible additive frontier model which quantifies the effects of multip...
A time varying‐40 autoregressive conditional heteroskedasticity (ARCH) model is proposed to describe the changing volatility of a financial return series over long time horizon, along with two‐step least squares and maximum likelihood estimation procedures. After preliminary estimation of the time varying trend in volatility scale, approximations t...
Spatial lifecourse epidemiology is an interdisciplinary field that utilizes advanced spatial, location-based, and artificial intelligence technologies to investigate the long-term effects of environmental, behavioural, psychosocial, and biological factors on health-related states and events and the underlying mechanisms. With the growing number of...
The inference via simultaneous confidence band is studied for stationary covariance function of dense functional data. A two-stage estimation procedure is proposed based on spline approximation, the first stage involving estimation of all the individual trajectories and the second stage involving estimation of the covariance function through smooth...
Background:
There is always a demand for fast and accurate algorithms for EEG signal processing. Owing to the high sample rate, EEG signals usually come with a large number of sample points, making it difficult to predict the working memory ability in cognitive research with EEG.
New method:
Following well-designed experiments, the functional li...
Existing functional data analysis literature has mostly overlooked data with spikes in mean, such as weekly sporting goods sales by a salesperson which spikes around holidays. For such functional data, two-step estimation procedures are formulated for the population mean function and holiday effect parameters, which correspond to the population sal...
Inference via simultaneous confidence band is studied for stationary covariance function of dense functional data. A two-stage estimation procedure is proposed based on spline approximation, the first stage involving estimation of all the individual trajectories and the second stage involving estimation of the covariance function through smoothing...
Estimation and Inference for Generalized Geoadditive Models In many application areas, data are collected on a count or binary response with spatial covariate information. In this paper, we introduce a new class of generalized geoadditive models (GGAMs) for spatial data distributed over complex domains. Through a link function, the proposed GGAM as...
Asymptotically correct simultaneous confidence bands (SCBs) are proposed for the mean and variance functions of a nonparametric regression model based on deterministic designs. The variance estimation is as efficient, up to order n⁻1/², as an infeasible estimator if the mean function were known. Simulation experiments provide strong evidence that c...
Stratified sampling is one of the most important survey sampling approaches and is widely used in practice. In this paper, we consider the estimation of the distribution function of a finite population in stratified sampling by the empirical distribution function (EDF) and kernel distribution estimator (KDE), respectively. Under general conditions,...
A plug-in estimator is proposed for a local measure of variance explained by regression, termed correlation curve in Doksum et al. (J Am Stat Assoc 89:571–582, 1994), consisting of a two-step spline–kernel estimator of the conditional variance function and local quadratic estimator of first derivative of the mean function. The estimator is oracally...
A kernel distribution estimator (KDE) is proposed for multi-step-ahead prediction error distribution of autoregressive time series, based on prediction residuals. Under general assumptions, the KDE is proved to be oracally efficient as the infeasible KDE and the empirical cumulative distribution function (cdf) based on unobserved prediction errors....
Simultaneous confidence bands (SCBs) are proposed for the distribution function of a finite population and of the latent superpopulation via the empirical distribution function (nonsmooth) and kernel distribution estimator (smooth) based on a simple random sample (SRS), either with or without finite population correction. It is shown that both nons...
In spite of widespread use of generalized additive models (GAMs) to remedy the “curse of dimensionality”, there is no well-grounded methodology developed for simultaneous inference and variable selection for GAM in existing literature. However, both are essential in enhancing the capability of statistical models. To this end, we establish simultane...
The semiparametric GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) model of Yang (2006, Journal of Econometrics 130, 365-384) has combined the flexibility of a nonparametric link function with the dependence on infinitely many past observations of the classic GARCH model. We propose a cubic spline procedure to estimate the unknown...
We propose a data-driven method to select significant variables in additive model via spline estimation. The additive structure of the regression model is imposed to overcome the ‘curse of dimensionality’, while the spline estimators provide a good approximation to the additive components of the model. The additive components are ordered according...
We consider nonparametric estimation of the covariance function for dense functional data using computationally efficient tensor product B-splines. We develop both local and global asymptotic distributions for the proposed estimator, and show that our estimator is as efficient as an "oracle" estimator where the true mean function is known. Simultan...
A smooth simultaneous confidence band (SCB) is obtained for heteroscedastic variance function in nonparametric regression by applying spline regression to the conditional mean function followed by Nadaraya–Waston estimation using the squared residuals. The variance estimator is uniformly oracally efficient, that is, it is as efficient as, up to ord...
We consider the problem of estimating a relationship nonparametrically using regression splines when there exist both continuous and categorical predictors. We combine the global properties of regression splines with the local properties of categorical kernel functions to handle the presence of categorical predictors rather than resorting to sample...
Over the last twenty-five years, various n-consistent estimators have been devised for the coefficient vector in the popular semiparametric single-index model. In this paper, we prove under general assumptions that the kernel estimator of the link function by a univariate regression on the index variable is oracally efficient, namely, the estimator...
We present a method of using local linear smoothing to construct simultaneous confidence bands for the mean function of densely spaced functional data. Our approach works well under mild conditions. In addition, the local linear estimator and its accompanying confidence band enjoy semiparametric efficiency in the sense that they are asymptotically...
We consider a varying coefficient regression model for sparse functional data, with time varying response variable depending linearly on some time-independent covariates with coefficients as functions of time-dependent covariates. Based on spline smoothing, we propose data-driven simultaneous confidence corridors for the coefficient functions with...
In this paper, we consider the uniform strong consistency of the cumulative distribution function estimator in nonparametric regression. We obtain the extended Glivenko–Cantelli theorem for the residual-based empirical distribution function.
In spite of the widespread use of generalized additive models (GAMs), there is no well established methodology for simultaneous inference and variable selection for the components of GAM. There is no doubt that both, inference on the marginal component functions and their selection, are essential in this additive statistical models. To this end, we...
We consider a varying coefficient regression model for sparse functional data, with time varying response variable depending linearly on some time independent covariates with coefficients as functions of time dependent covariates. Based on spline smoothing, we propose data driven simultaneous confidence corridors for the coefficient functions with...
In spite of the widespread use of generalized additive models (GAMs), there is no well established methodology for simultaneous inference and variable selection for the components of GAM. There is no doubt that both, inference on the marginal component functions and their selection, are essential in this additive statistical models. To this end, we...
Many statistical models arising in applications contain non- and weakly-identified parameters. Due to identifiability concerns, tests concerning the parameters of interest may not be able to use conventional theories and it may not be clear how to assess statistical significance. This paper extends the literature by developing a testing procedure t...
A plug-in kernel estimator is proposed for Hölder continuous cumulative distribution function (cdf) based on a random sample. Uniform closeness between the proposed estimator and the empirical cdf estimator is established, while the proposed estimator is smooth instead of a step function. A smooth simultaneous confidence band is constructed based o...
Time series often contain unknown trend functions and unobservable error terms. As is known, Yule-Walker estimators are asymptotically efficient for autoregressive time series. The focus of this article is the Yule-Walker estimators for time series with trends. A nonparametric detrending procedure is proposed. It is concluded that the asymptotic pr...
The paper considers the construction of a confidence band for the trend function of a stationary time series. An explicit formula is derived based on polynomial splines and Sunklodas (1984). The performance of the confidence band is illustrated by simulation studies. The proposed method is applied to the analysis of the annual yields of wheat in th...
A polynomial spline estimator is proposed for the mean function of dense functional data together with a simultaneous confidence band which is asymptotically correct. In addition, the spline estimator and its accompanying confidence band enjoy oracle efficiency in the sense that they are asymptotically the same as if all random trajectories are obs...
Functional data analysis has received considerable recent attention and a number of successful applications have been reported. In this paper, asymp-totically simultaneous confidence bands are obtained for the mean function of the functional regression model, using piecewise constant spline estimation. Simulation experiments corroborate the asympto...
We consider a class of semiparametric GARCH models with additive autoregressive components linked together by a dynamic coefficient. We propose estimators for the additive components and the dynamic coefficient based on spline smoothing. The estimation procedure involves only a small number of least squares operations, thus it is computationally ef...
The article considers the Yule-Walker estimator of the autoregressive coefficient based on the observed time series that contains an unknown trend function and an autoregressive error term. The trend function is estimated by means of B-splines and then subtracted from the observations. The Yule-Walker estimator is obtained from the residual sequenc...
Background and objectives: Motivated from a study on breast cancer, we consider the problem of evaluating a statistical hypothesis when some model characteristics are potentially non or weakly identifiable from observed data. Such scenarios are common in longitudinal studies for evaluating a covariate effect when dropouts may be informative. The hy...
Because of the growing interest in nutraceuticals and their health benefits, it is important to develop tools for modeling degradation of nutraceuticals in low-moisture- and high-temperature-heated foods. The objective of this study was to estimate the kinetic parameters for the degradation of anthocyanins in grape pomace and to calculate the boots...
Motivation:
The genetic basis of complex traits often involves the function of multiple genetic factors, their interactions and the interaction between the genetic and environmental factors. Gene-environment (G×E) interaction is considered pivotal in determining trait variations and susceptibility of many genetic disorders such as neurodegenerativ...
In a random-design nonparametric regression model, procedures for detecting jumps in the regression function via constant and linear spline estimation method are proposed based on the maximal differences of the spline estimators among neighbouring knots, the limiting distributions of which are obtained when the regression function is smooth. Simula...
Generalized additive models (GAM) are multivariate nonparametric regressions for non-Gaussian responses including binary and count data. We propose a spline-backfitted kernel (SBK) estimator for the component functions. Our results are for weakly dependent data and we prove oracle efficiency. The SBK techniques is both computational expedient and t...
A spline-backfitted kernel smoothing method is proposed for partially linear additive model. Under assumptions of stationarity and geometric mixing, the proposed function and parameter estimators are oracally efficient and fast to compute. Such superior properties are achieved by applying to the data spline smoothing and kernel smoothing consecutiv...
Longitudinal data analysis is a central piece of statistics. The data are curves and they are observed at random locations. This makes the construction of a simultaneous confidence corridor (SCC) (confidence band) for the mean function a challenging task on both the theoretical and the practical side. Here we propose a method based on local linear...
3 Humboldt-Universität zu Berlin and 4 National Central University Longitudinal data analysis is a central piece of statistics. The data are curves and they are observed at random locations. This makes the construc-tion of a simultaneous confidence corridor (SCC) (confidence band) for the mean function a challenging task on both the theoretical and...
Although many types of confidence bands exist for nonparametric regression with i.i.d. data, theoretical properties of such bands have never been established under dependence. We propose simultaneous confi-dence bands for nonparametric prediction function of time-series data using spline estimation. Asymptotic properties are established under the a...
Under weak conditions of smoothness and mixing, we propose spline-backfitted spline (SBS) estimators of the component functions for a nonlinear additive autoregression model that is both computationally expedient for analyzing high dimensional large time series data, and theoretically reliable as the estimator is oracally efficient and comes with a...
Additive coefficient model (Xue and Yang, 2006a, 2006b) is a flexible regression and autoregression tool that circumvents the We propose spline-backfitted kernel (SBK) and spline-backfitted local linear (SBLL) estimators for the component functions in the additive coefficient model that are both (i) computationally expedient so they are usable for...
A great deal of effort has been devoted to the inference of additive model in the last decade. Among existing procedures,
the kernel type are too costly to implement for high dimensions or large sample sizes, while the spline type provide no asymptotic
distribution or uniform convergence. We propose a one step backfitting estimator of the component...
Asymptotically exact and conservative confidence bands are obtained for possibly heteroscedastic variance functions, using piecewise constant and piecewise linear spline estimation, respectively. The variance estimation is as efficient as an infeasible estimator when the conditional mean function is known, and the widths of the confidence bands are...
For the past two decades, the single-index model, a special case of pro-jection pursuit regression, has proven to be an efficient way of coping with the high-dimensional problem in nonparametric regression. In this paper, based on a weakly dependent sample, we investigate a robust single-index model, where the single-index is identified by the best...
Asymptotically exact and conservative confidence bands are obtained for a nonparametric regression function, using piecewise constant and piecewise lin-ear spline estimation, respectively. Compared to the pointwise confidence interval of Huang (2003), the confidence bands are inflated by a factor proportional to {log (n)} 1/2 , with the same width...
A smooth kernel estimator is proposed for multivariate cumulative distribution functions (cdf), extending the work of Yamato [H. Yamato, Uniform convergence of an estimator of a distribution function, Bull. Math. Statist. 15 (1973), pp. 69–78.] on univariate distribution function estimation. Under assumptions of strict stationarity and geometricall...
Crop yields are highly variable spatially and temporally as a result of complex interactions among topography, weather conditions, and management practices. The objective of this study was to analyze the effects of management practices on the relationship between crop yields and precipitation and crop yields and topography using 10 yr of yield data...
Degradation of nutraceuticals in low- and intermediate-moisture foods heated at high temperature (>100 degrees C) is difficult to model because of the nonisothermal condition. Isothermal experiments above 100 degrees C are difficult to design because they require high pressure and small sample size in sealed containers. Therefore, a nonisothermal m...
Additive model has been widely recognized as an efiective tool for dimension reduction. Ex- isting methods for estimation of additive regression function, including backfltting, marginal integration, projection and spline methods, do not provide any level of uniform confldence. In this paper we propose a simple construction of confldence band for t...
For the past two decades, single-index model, a special case of projection pursuit regression, has proven to be an efficient way of coping with the high dimensional problem in nonparametric regression. In this paper, based on weakly dependent sample, we investigate the single-index prediction (SIP) model which is robust against deviation from the s...
A seasonal additive nonlinear vector autoregression (SANVAR) model is proposed for multivariate seasonal time series to explore the possible interaction among the various univariate series. Significant lagged variables are selected and additive autoregression functions estimated based on the selected variables using spline smoothing method. Conserv...
To investigate the acute response of immature articular cartilage, in the distraction and consolidation phases, to 30% tibial lengthening.
Sixteen immature New Zealand white rabbits underwent diaphyseal lengthening of the left tibia by callotasis at a distraction rate of 0.4mm twice daily. A sham control group of 12 rabbits underwent fixation and o...
Due to difficulty in computing, confidence intervals (CIs) for kinetic parameters and the predicted dependent variable (Y) in nonlinear models are often not reported. The purpose of this work was to present a straightforward method to calculate asymptotic CIs for kinetic parameters and the associated Y variable for nonisothermal survivor or retenti...
A flexible nonparametric regression model is considered in which the re-sponse depends linearly on some covariates, with regression coefficients as additive functions of other covariates. Polynomial spline estimators are proposed for the unknown coefficient functions, with optimal univariate mean square convergence rate under geometric mixing condi...