Efficient Estimation of Population-Level Summaries in General Semiparametric Regression Models

Texas A&M University, College Station, Texas, United States
Journal of the American Statistical Association (Impact Factor: 1.98). 02/2007; 102(March):123-139. DOI: 10.2307/27639826
Source: RePEc


This paper considers a wide class of semiparametric regression models in which interest focuses on population-level quantities that combine both the parametric and nonparametric parts of the model. Special cases in this approach include generalized partially linear models, gener- alized partially linear single index models, structural measurement error models and many others. For estimating the parametric part of the model e-ciently, proflle likelihood kernel estimation methods are well-established in the literature. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model, e.g., population mean, probabilities, etc. We place this problem into a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric e-cient. For estimating the population mean with no missing data, we show that the sample mean is semiparametric e-cient for canonical exponential families, but not in general. We apply the methods to a problem in nutritional epidemiology, where estimating the distribution of usual intake is of primary interest, and semiparametric methods are not available. Extensions to the case of missing response data are also discussed.

Download full-text


Available from: Arnab Maity, Oct 13, 2015
13 Reads
  • Source
    • "discusses this issue. Also see Maity, et al. (2007) for other comments on bandwidth estimation in semiparametric models. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.
    Statistics in Biosciences 05/2009; 1(1):10-31. DOI:10.1007/s12561-009-9000-7
  • Source
    • "Estimation of B 0 and θ 0 (•) using profile likelihood and backfitting methods is discussed in detail in Lin and Carroll (2006) and Claeskens and Carroll (2007) when there are no missing response data: see also Maity, et al. (2007) for detailed discussion of technical assumptions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The problem of quantile estimation in general semiparametric regression models is considered. We derive plug-in kernel-based estimators, investigate their asymptotic distribution and establish the semiparametric efficiency of these estimators under mild assumptions. We apply our methodology in an example in nutritional epidemiology. The generalization to the important case where responses are missing at random is also addressed.
    Statistics [?] Probability Letters 11/2008; 78(16):2744-2750. DOI:10.1016/j.spl.2008.03.022 · 0.60 Impact Factor
  • Source
    • "The restriction nh 4 → 0 removes the O p (h 2 ) bias term. It is suggested in Sepanski et al. [4] that in many semiparametric problems, the optimal bandwidth for estimating parameters such as B 0 is of the order n −1/3 , which of course satisfies nh 4 → 0. In the case of no repeated measures, i.e., when m = 1, Maity et al. [3] find that they have good experience numerically by first estimating the bandwidth via likelihood crossvalidation, which will be of order n −1/5 , and then multiplying it by n −2/15 to get the bandwidth to be of order n −1/3 . "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers a wide family of semiparametric repeated measures regression models, in which the main interest is on estimating population-level quantities such as mean, variance, probabilities etc. Examples of our framework include generalized linear models for clustered/longitudinal data, among many others. We derive plug-in kernel-based estimators of the population level quantities and derive their asymptotic distribution. An example involving estimation of the survival function of hemoglobin measures in the Kenya hemoglobin study data is presented to demonstrate our methodology.
Show more