Article

A Partially Linear Regression Model for Data from an Outcome-Dependent Sampling Design.

University of North Carolina at Chapel Hill, USA.
Applied Statistics (Impact Factor: 1.42). 08/2011; 60(4):559-574. DOI: 10.1111/j.1467-9876.2010.00756.x
Source: PubMed

ABSTRACT The outcome dependent sampling scheme has been gaining attention in both the statistical literature and applied fields. Epidemiological and environmental researchers have been using it to select the observations for more powerful and cost-effective studies. Motivated by a study of the effect of in utero exposure to polychlorinated biphenyls on children's IQ at age 7, in which the effect of an important confounding variable is nonlinear, we consider a semi-parametric regression model for data from an outcome-dependent sampling scheme where the relationship between the response and covariates is only partially parameterized. We propose a penalized spline maximum likelihood estimation (PSMLE) for inference on both the parametric and the nonparametric components and develop their asymptotic properties. Through simulation studies and an analysis of the IQ study, we compare the proposed estimator with several competing estimators. Practical considerations of implementing those estimators are discussed.

Download full-text

Full-text

Available from: Guoyou Qin, Jun 29, 2015
0 Followers
 · 
213 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-phased designs and biased sampling designs are two of the well recognized approaches to enhance study efficiency. In this paper, we propose a new and cost-effective sampling design, the two-phase probability dependent sampling design (PDS), for studies with a continuous outcome. This design will enable investigators to make efficient use of resources by targeting more informative subjects for sampling. We develop a new semiparametric empirical likelihood inference method to take advantage of data obtained through a PDS design. Simulation study results indicate that the proposed sampling scheme, coupled with the proposed estimator, is more efficient and more powerful than the existing outcome dependent sampling design and the simple random sampling design with the same sample size. We illustrate the proposed method with a real data set from an environmental epidemiologic study.
    Journal of the Royal Statistical Society Series B (Statistical Methodology) 01/2014; 76(1):197-215. DOI:10.1111/rssb.12029 · 5.72 Impact Factor