Pooling biospecimens and limits of detection: Effects on ROC curve analysis

Harvard University, Cambridge, Massachusetts, United States
Biostatistics (Impact Factor: 2.65). 11/2006; 7(4):585-98. DOI: 10.1093/biostatistics/kxj027
Source: PubMed


Frequently, epidemiological studies deal with two restrictions in the evaluation of biomarkers: cost and instrument sensitivity.
Costs can hamper the evaluation of the effectiveness of new biomarkers. In addition, many assays are affected by a limit of
detection (LOD), depending on the instrument sensitivity. Two common strategies used to cut costs include taking a random
sample of the available samples and pooling biospecimens. We compare the two sampling strategies when an LOD effect exists.
These strategies are compared by examining the efficiency of receiver operating characteristic (ROC) curve analysis, specifically
the estimation of the area under the ROC curve (AUC) for normally distributed markers. We propose and examine a method to
estimate AUC when dealing with data from pooled and unpooled samples where an LOD is in effect. In conclusion, pooling is
the most efficient cost-cutting strategy when the LOD affects less than 50% of the data. However, when much more than 50%
of the data are affected, utilization of the pooling design is not recommended.

Download full-text


Available from: Albert Vexler
  • Source
    • "A hybrid design is more efficient than only measuring pooled assays or only measuring individual assays. When LLOD < µ x , the traditional pooling design (α = 0) is more efficient than simple random sampling [Vexler, Liu and Schisterman (2006), Mumford et al. (2006)]. However, when a pooled-unpooled hybrid design is applicable, when LLOD ≤ µ x and the objective is the estimate µ x , we recommend a one-pool design given that pooling and measurement errors are negligible. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pooling specimens, a well-accepted sampling strategy in biomedical research, can be applied to reduce the cost of studying biomarkers. Even if the cost of a single assay is not a major restriction in evaluating biomarkers, pooling can be a powerful design that increases the efficiency of estimation based on data that is censored due to an instrument's lower limit of detection (LLOD). However, there are situations when the pooling design strongly aggravates the detection limit problem. To combine the benefits of pooled assays and individual assays, hybrid designs that involve taking a sample of both pooled and individual specimens have been proposed. We examine the efficiency of these hybrid designs in estimating parameters of two systems subject to a LLOD: (1) normally distributed biomarker with normally distributed measurement error and pooling error; (2) Gamma distributed biomarker with double exponentially distributed measurement error and pooling error. Three-assay design and two-assay design with replicates are applied to estimate the measurement and pooling error. The Maximum likelihood method is used to estimate the parameters. We found that the simple one-pool design, where all assays but one are random individuals and a single pooled assay includes the remaining specimens, under plausible conditions, is very efficient and can be recommended for practical use.
    Full-text · Article · Feb 2012 · The Annals of Applied Statistics
  • Source
    • "The statistical implications of pooling measurements across subjects have been considered for biomarker studies (Vexler et al., 2006; Schisterman and Vexler, 2008; Schisterman et al., 2010), diagnostic testing (Faraggi et al., 2003; Liu and Schisterman, 2003; Mumford et al., 2006; Vexler et al., 2008), and longitudinal outcomes (Albert and Shih, 2009). In the Upstate KIDS study, the variables subject to pooling (e.g., PCB) will likely be treated as covariates in regression models for describing and explaining developmental problems. "
    [Show abstract] [Hide abstract]
    ABSTRACT: It has become increasingly common in epidemiological studies to pool specimens across subjects to achieve accurate quantitation of biomarkers and certain environmental chemicals. In this article, we consider the problem of fitting a binary regression model when an important exposure is subject to pooling. We take a regression calibration approach and derive several methods, including plug-in methods that use a pooled measurement and other covariate information to predict the exposure level of an individual subject, and normality-based methods that make further adjustments by assuming normality of calibration errors. Within each class we propose two ways to perform the calibration (covariate augmentation and imputation). These methods are shown in simulation experiments to effectively reduce the bias associated with the naive method that simply substitutes a pooled measurement for all individual measurements in the pool. In particular, the normality-based imputation method performs reasonably well in a variety of settings, even under skewed distributions of calibration errors. The methods are illustrated using data from the Collaborative Perinatal Project.
    Full-text · Article · Jun 2011 · Biometrics
  • Source
    • ", n, j = 1, . . . , m and d x , d y are the values of the LOD (e.g., Lynn, 2001; Lubin et al., 2004; Mumford et al., 2006; Schisterman et al., 2006; Vexler et al., 2006). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we consider comparing the areas under correlated receiver operating characteristic (ROC) curves of diagnostic biomarkers whose measurements are subject to a limit of detection (LOD), a source of measurement error from instruments' sensitivity in epidemiological studies. We propose and examine the likelihood ratio tests with operating characteristics that are easily obtained by classical maximum likelihood methodology.
    Full-text · Article · Dec 2007 · Biometrics
Show more