Puying Zhao's research while affiliated with Yunnan University and other places

Publications (17)

Preprint
Full-text available
Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedu...
Article
In missing data analysis, it is challenging to estimate the propensity score (PS) function. Traditional parametric, nonparametric or semiparametric approaches to estimate the PS function may be subject to model misspecification or lead to inefficient estimation. To address the aforementioned issues, we here assume that the PS function is unknown, a...
Article
To estimate distribution functions and quantiles of a response variable when the data having nonignorable nonresponse and the dimension of covariate is not low, this article assumes that the propensity follows a general semiparametric model, but the distribution of the response variable and related covariates is unspecified. To address the identifi...
Preprint
Public-use survey data are an important source of information for researchers in social science and health studies to build statistical models and make inferences on the target finite population. This paper presents two general inferential tools through the pseudo empirical likelihood and the sample empirical likelihood methods. Theoretical results...
Article
This paper provides a rigorous treatment on design-based estimating equation inference using complex survey data in the presence of nuisance functionals. The proposed design-based framework covers parameters from inequality measures and measures on performance evaluation in economic, business and financial studies but the scope of the paper is more...
Article
We propose a Bayesian empirical likelihood approach to survey data analysis on a vector of finite population parameters defined through estimating equations. Our method allows overidentified estimating equation systems and is applicable to both smooth and non‐differentiable estimating functions. Our proposed Bayesian estimator is design consistent...
Article
Wilks's theorem is useful for constructing confidence regions. When applying the popular empirical likelihood to data with nonignorable nonresponses, Wilks's phenomenon does not hold. This paper unveils that this is caused by the extra estimation of the nuisance parameter in the nonignorable nonresponse propensity. Motivated by this result, we prop...
Article
This paper provides an overview on two parallel approaches to design‐based inference with complex survey data: the pseudo empirical likelihood methods and the sample empirical likelihood methods. The general framework covers parameters defined through smooth or non‐differentiable estimating functions for analytic use of survey data as well as descr...
Article
We consider identification and estimation in a longitudinal study with nonignorable nonmonotone nonresponse in responses. To handle the identifiability issue, we use a baseline covariates named as nonresponse instrument that can be excluded from the nonresponse propensity conditional on other observed covariates and the variables subject to nonresp...
Article
Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response...
Article
Handling data with the missing not at random (MNAR) mechanism is still a challenging problem in statistics. In this article, we propose a nonparametric imputation method based on the propensity score in a general class of semiparametric models for nonignorable missing data. Compared with the existing imputation methods, the proposed imputation meth...
Article
In a linear regression model with nonignorable missing covariates, non-normal errors or outliers can lead to badly biased and misleading results with standard parameter estimation methods built on either least squares- or likelihood-based methods. A propensity score method with a robust and efficient regression procedure called composite quantile r...
Article
We develop an empirical likelihood (EL) inference on parameters in generalized estimating equations with nonignorably missing response data. We consider an exponential tilting model for the nonignorably missing mechanism, and propose modified estimating equations by imputing missing data through a kernel regression method. We establish some asympto...

Citations

... Zhong and Rao (2000) studied EL inferences on population mean under stratified sampling. Finite population parameters defined through the so-called census estimating equations and the related inferential procedures have been discussed by Chen and Kim (2014) and Zhao et al. (2022) through the sample EL approach as well as the pseudo EL approach (Zhao and Wu, 2019). For parameters defined through U-statistics, jackknife EL can be used to reduce the computational complexities (Chen and Tabri, 2021). ...
... The proposed method relies essentially on a readily available instrument, which is often unrealistic in practice. Some technical non-design based guidelines for the choice of viable instruments can be found elsewhere (e.g., and Zhao et al., 2021). Additionally, the proposed method uses the Kernel-based estimation of the unknown function g(·), which may prove computationally difficult in the case of high-dimensional auxiliary variables, u. ...
... Empirical likelihood ratio confidence intervals have also been shown to be practically useful for adaptive cluster sampling (Salehi et al., 2010), fish abundance surveys (Chen et al., 2004), and for the Gini index and other inequality measures with economic survey data (Qin et al., 2010;Zhao et al., 2020c). Advanced theoretical developments on empirical likelihood for complex survey data analysis (Zhao et al., 2020a(Zhao et al., , 2020b(Zhao et al., , 2020c(Zhao et al., and 2022 currently remain at the stage of research explorations. It is expected that applications of these methods for analytic use of survey data will catch up in coming years. ...
... The most commonly used strategy is to use a two-step procedure where a consistent nonparametric estimator for the nuisance functional is constructed first and then used in the second step as a plug-in estimator for inferences on the main parameters of interest. Zhao et al. (2020) is among the first to discuss the design-based two-step EL method and the generalized method of moments (GMM) method for complex survey data in the presence of nuisance functionals. The two-step survey weighted estimating equations (SWEE) approach discussed by Zhao et al. (2020), however, has two major limitations. ...
... Many fields have been explored by the BEL method. For complex surveys, Rao and Wu (2010) constructed the Bayesian empirical likelihood intervals for a population mean, and Zhao, Ghosh, et al. (2020) proposed the Bayesian empirical likelihood for finite population parameters defined through estimating equations. Chaudhuri and Ghosh (2011) applied the Bayesian empirical likelihood for the small area estimation. ...
... It is apparent that the pseudo empirical likelihood function P EL (p) only involves the first order inclusion probabilities, and valid design-based confidence intervals and hypothesis tests typically require second order inclusion probabilities π ij = P (i, j ∈ S). It was shown by Zhao and Wu (2019) that the limiting distribution of −2r P EL (θ N ) is a weighted χ 2 with the weights involving the design-based variance of U n (θ N ). When θ is a scalar, the limiting distribution of −2r P EL (θ N )/deff is a standard χ 2 with one degree of freedom, where "deff" is the design effect (Wu and Rao, 2006). ...
... Drew & Fuller (1980) referred to a hard-core nonresponse group as a group in which some of the populations never respond to surveys. Studies have also used response groups according to response patterns in longitudinal data as monotone/non-monotone nonresponse groups ( Zao (2020); Sadi (2014); Zhao et al. (2017); Qian and Xie (2009)). Assuming that all participants responded in the first wave, the t th wave response group is divided into two groups, including a continuous response group that continues to respond to wave t − 1 , and a noncontinuous response group that responded intermittently (see Figure 1). ...
... When there exists sparsity in regression models with nonignorable missing data, some regularization approaches were proposed for both low and high dimensional settings. For example, Zhao et al. (2017) developed a robust variable selection approach in linear regression models with nonignorable missing covariates; Zhao et al. (2018) suggested a penalized pairwise pseudo likelihood procedure for variable selection with nonignorable missing data. However, in the ultrahigh-dimensional data settings, these penalized estimation approaches cannot work well due to the nature of sparsity in the PS models. ...
... Here we propose to perform sensitivity analysis of ACCMV via an exponential tilting approach (Kim and Yu, 2011;Shao and Wang, 2016;Zhao et al., 2017). For a = 1 d , recall that the ACCMV in equation (10) requires: ...
... Under nonignorable nonresponse, Kim and Im (2014) considered propensity score weighting adjustment for the follow-up sample based on the method-of-moments estimator of the propensity score. Jiang et al. (2016) presented a propensity score adjustment method for regression models with nonignorable missing covariates based on the semiparametric estimation of the propensity score. Zhao et al. (2018) discussed the generalized method of moments (GMM) for estimating the propensity score model parameters and the inverse probability weighting estimator of the population mean in longitudinal surveys under nonignorable nonmonotone nonresponse. ...