Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials

Department of Biostatistics, University of Washington, Seattle, WA 98105, U.S.A.
Statistics in Medicine (Impact Factor: 2.04). 03/2014; 33(6). DOI: 10.1002/sim.6006
Source: PubMed

ABSTRACT To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean 'importance-weighted' breadth (Y) of the T-cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y | W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method. Copyright © 2013 John Wiley & Sons, Ltd.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A variant of the case-cohort design is proposed for the situation in which a correlate of the exposure (or prognostic factor) of interest is available for all cohort members, and exposure information is to be collected for a case-cohort sample. The cohort is stratified according to the correlate, and the subcohort is selected by stratified random sampling. A number of possible methods for the analysis of such exposure stratified case-cohort samples are presented, some of their statistical properties developed, and approximate relative efficiency and optimal allocation to the strata discussed. The methods are compared to each other, and to randomly sampled case-cohort studies, in a limited computer simulation study. We found that all of the proposed analysis methods performed well and were more efficient than a randomly sampled case-cohort study.
    Lifetime Data Analysis 04/2000; 6(1):39-58. DOI:10.1023/A:1009661900674 · 0.54 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses optimal design and efficiency of two-phase (2P) case-control studies in which the first phase uses an error-prone exposure measure, Z, while the second phase measures true, dichotomous exposure, X, in a subset of subjects. Optimal design of a separate second phase, to be added to a preexisting study, is also investigated. Differential misclassification is assumed throughout. Results are also applicable to 2P cohort studies with error-prone and error-free measures of disease status but error-free exposure measures. While software based on the mean score method of Reilly and Pepe (1995, Biometrika 82, 299--314) can find optimal designs given pilot data, the lack of simple formulae makes it difficult to generalize about efficiency compared to one-phase (1P) studies based on X alone. Here, formulae for the optimal ratios of cases to controls and first- to second-phase sizes, and the optimal second-phase stratified sampling fractions, given a fixed budget, are given. The maximum efficiency of 2P designs compared to a 1P design is deduced and is shown to be bounded from above by a function of the sensitivities and specificities of Z. The efficiency of 'balanced' separate second-phase designs (Breslow and Cain, 1988, Biometrika 75, 11--20)-in which equal numbers of subjects are chosen from each first-phase strata-compared to optimal design is deduced, enabling situations where balanced designs are nearly optimal to be identified.
    Biostatistics 11/2005; 6(4):590-603. DOI:10.1093/biostatistics/kxi029 · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Dozens of human immunodeficiency virus-type 1 (HIV-1) vaccine candidates specifically designed to elicit cytotoxic T-lymphocyte (CTL) responses have entered the pipeline of clinical trials. Evaluating the immunogenicity and potential efficacy of these HIV-1 vaccine candidates is challenging in the face of the extensive viral genetic diversity of circulating strains. Standardized peptide reagents to define the magnitude and potential breadth of the T-cell response, especially to circulating strains of HIV-1, are needed. For this purpose we developed a biometric approach based on T-cell recognition pattern for defining standardized reagents. Circulating strains in the Los Alamos database were evaluated and standardized algorithms to define all potential T-cell epitopes (PTEs) were generated. While many unique PTEs could be identified, a finite number based upon prevalence of circulating strains in the database, which we define as vaccine-important PTEs (VIPs), were used to select a common standardized panel of HIV-1 peptides for CTL-based vaccine evaluation. The usability of PTE peptide set was manifested by detection of Nef-specific CTL responses in HIV-1 subtype B infections.
    Vaccine 12/2006; 24(47-48):6893-904. DOI:10.1016/j.vaccine.2006.06.009 · 3.49 Impact Factor