Efficient Estimation of Population-Level Summaries in General Semiparametric Regression Models

Journal of the American Statistical Association (Impact Factor: 1.83). 02/2007; 102(March):123-139. DOI: 10.2307/27639826
Source: RePEc

ABSTRACT This paper considers a wide class of semiparametric regression models in which interest focuses on population-level quantities that combine both the parametric and nonparametric parts of the model. Special cases in this approach include generalized partially linear models, gener- alized partially linear single index models, structural measurement error models and many others. For estimating the parametric part of the model e-ciently, proflle likelihood kernel estimation methods are well-established in the literature. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model, e.g., population mean, probabilities, etc. We place this problem into a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric e-cient. For estimating the population mean with no missing data, we show that the sample mean is semiparametric e-cient for canonical exponential families, but not in general. We apply the methods to a problem in nutritional epidemiology, where estimating the distribution of usual intake is of primary interest, and semiparametric methods are not available. Extensions to the case of missing response data are also discussed.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Distribution function estimation plays a significant role of foundation in statistics since the population distribution is always involved in statistical inference and is usually unknown. In this paper, we consider the estimation of the distribution function of a response variable Y with missing responses in the regression problems. It is proved that the augmented inverse probability weighted estimator converges weakly to a zero mean Gaussian process. A augmented inverse probability weighted empirical log-likelihood function is also defined. It is shown that the empirical log-likelihood converges weakly to the square of a Gaussian process with mean zero and variance one. We apply these results to the construction of Gaussian process approximation based confidence bands and empirical likelihood based confidence bands of the distribution function of Y. A simulation is conducted to evaluate the confidence bands.
    Journal of Statistical Planning and Inference 01/2010; 140(9):2778-2789. · 0.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider regression models with parametric (linear or nonlinear) regression function and allow responses to be ``missing at random.'' We assume that the errors have mean zero and are independent of the covariates. In order to estimate expectations of functions of covariate and response we use a fully imputed estimator, namely an empirical estimator based on estimators of conditional expectations given the covariate. We exploit the independence of covariates and errors by writing the conditional expectations as unconditional expectations, which can now be estimated by empirical plug-in estimators. The mean zero constraint on the error distribution is exploited by adding suitable residual-based weights. We prove that the estimator is efficient (in the sense of H\'{a}jek and Le Cam) if an efficient estimator of the parameter is used. Our results give rise to new efficient estimators of smooth transformations of expectations. Estimation of the mean response is discussed as a special (degenerate) case. Comment: Published in at the Annals of Statistics ( by the Institute of Mathematical Statistics (
    The Annals of Statistics 08/2009; · 2.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We revisit the second-order nonlinear least square estimator proposed in Wang and Leblanc (Anne Inst Stat Math 60:883-900, 2008) and show that the estimator reaches the asymptotic optimality concerning the estimation variability. Using a fully semiparametric approach, we further modify and extend the method to the heteroscedastic error models and propose a semiparametric efficient estimator in this more general setting. Numerical results are provided to support the results and illustrate the finite sample performance of the proposed estimator.
    Annals of the Institute of Statistical Mathematics 08/2012; 64(4):751-764. · 0.74 Impact Factor

Full-text (2 Sources)

Available from
May 20, 2014