Roderick J. A. Little’s research while affiliated with Concordia University–Ann Arbor and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (63)


Nonresponse Bias Analysis in Longitudinal Studies: A Comparative Review with an Application to the Early Childhood Longitudinal Study
  • Article

March 2024

·

13 Reads

·

2 Citations

International Statistical Review

·

Roderick J.A. Little

·

Ya Mo

·

Longitudinal studies are subject to nonresponse when individuals fail to provide data for entire waves or particular questions of the survey. We compare approaches to nonresponse bias analysis (NRBA) in longitudinal studies and illustrate them on the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011 (ECLS‐K:2011). Wave nonresponse with attrition often yields a monotone missingness pattern, and the missingness mechanism can be missing at random (MAR) or missing not at random (MNAR). We discuss weighting, multiple imputation (MI), incomplete data modelling and Bayesian approaches to NRBA for monotone patterns. Weighting adjustments can be effective when the constructed weights are correlated with the survey outcome of interest. MI allows for variables with missing values to be included in the imputation model, yielding potentially less biased and more efficient estimates. We add offsets in the MAR results to provide sensitivity analyses to assess MNAR deviations. We conduct NRBA for descriptive summaries and analytic model estimates in the ECLS‐K:2011 application. The strength of evidence about our NRBA depends on the strength of the relationship between the fully observed variables and the key survey outcomes, so the key to a successful NRBA is to include strong predictors.


A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys

December 2022

·

13 Reads

·

5 Citations

Journal of Educational and Behavioral Statistics

Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010–2011. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves the existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.


Figure 1. Propensity score distributions for the treated and the controls.
MACS dataset: estimated one year change in CD4 count obtained using different methods. Standard errors were calculated based on 2000 bootstrap samples. ATE = Average treatment effect; ATM = average treatment effect on an evenly matchable set; ATO = average treatment effect on the overlap population; TATEα = truncated average treatment effect with a truncation at a pre-defined α level.
Addressing Disparities in the Propensity Score Distributions for Treatment Comparisons from Observational Studies
  • Article
  • Full-text available

December 2022

·

41 Reads

·

1 Citation

Stats

Propensity score (PS) based methods, such as matching, stratification, regression adjustment, simple and augmented inverse probability weighting, are popular for controlling for observed confounders in observational studies of causal effects. More recently, we proposed penalized spline of propensity prediction (PENCOMP), which multiply-imputes outcomes for unassigned treatments using a regression model that includes a penalized spline of the estimated selection probability and other covariates. For PS methods to work reliably, there should be sufficient overlap in the propensity score distributions between treatment groups. Limited overlap can result in fewer subjects being matched or in extreme weights causing numerical instability and bias in causal estimation. The problem of limited overlap suggests (a) defining alternative estimands that restrict inferences to subpopulations where all treatments have the potential to be assigned, and (b) excluding or down-weighting sample cases where the propensity to receive one of the compared treatments is close to zero. We compared PENCOMP and other PS methods for estimation of alternative causal estimands when limited overlap occurs. Simulations suggest that, when there are extreme weights, PENCOMP tends to outperform the weighted estimators for ATE and performs similarly to the weighted estimators for alternative estimands. We illustrate PENCOMP in two applications: the effect of antiretroviral treatments on CD4 counts using the Multicenter AIDS cohort study (MACS) and whether right heart catheterization (RHC) is a beneficial treatment in treating critically ill patients.

Download

Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics

October 2021

·

31 Reads

·

11 Citations

Journal of Survey Statistics and Methodology

·

Steve Heeringa

·

David Johnson

·

[...]

·

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.


Fig. 1. Median standardized error measure (SEM, y-axes) against value of diagnostic (x-axes) for twelve candidate diagnostics (columns), three values of r ¼ Cor(X 1 ,Y) (rows) using the median of 2000 simulated data sets. k ; CorðX l ; X 2 Þ is fixed at 1 (Figures 2 and 3 give the same results for k ¼ 0.5 and k ¼ 0, respectively) For reference, the y ¼ x line is plotted in black. Shape and grayscale indicate different true selection mechanisms from Table 2, and connected segments represent different values of {b x , b y } corresponding to the same selection mechanism.
A Simulation Study of Diagnostics for Selection Bias

September 2021

·

91 Reads

·

10 Citations

Journal of Official Statistics

A non-probability sampling mechanism arising from nonresponse or non-selection is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is ‘non-ignorable’, that is, dependent on the unobserved outcome of interest, since it is then undetectable and thus cannot be ameliorated. We extend a simulation study by Nishimura et al. (2016) adding two recently published statistics: the ‘standardized measure of unadjusted bias’ (SMUB) and ‘standardized measure of adjusted bias’ (SMAB), which explicitly quantify the extent of bias (in the case of SMUB) or nonignorable bias (in the case of SMAB) under the assumption that a specified amount of nonignorable selection exists. Our findings suggest that this new sensitivity diagnostic is more correlated with, and more predictive of, the true, unknown extent of selection bias than other diagnostics, even when the underlying assumed level of non-ignorability is incorrect.


Figure 1. Propensity score distributions between the treated (grey) and control (black) if (A) including all covariates in the propensity score model, π 0.95 z=1 = 18% and π 0.95 z=0 = 22%; (B) if including only the covariates that were selected more than 20% of times by Step_ALT among 1000 bootstrap samples, π 0.95 z=1 = 33% and π 0.95 z=0 = 49%
1000× RMSE with sample size of 1000. The treatment effects η = 2. S1, S2, and S3 denote scenario 1, 2, and 3, respectively.
Robust Causal Estimation from Observational Studies Using Penalized Spline of Propensity Score for Treatment Comparison

June 2021

·

30 Reads

·

3 Citations

Stats

Without randomization of treatments, valid inference of treatment effects from observational studies requires controlling for all confounders because the treated subjects generally differ systematically from the control subjects. Confounding control is commonly achieved using the propensity score, defined as the conditional probability of assignment to a treatment given the observed covariates. The propensity score collapses all the observed covariates into a single measure and serves as a balancing score such that the treated and control subjects with similar propensity scores can be directly compared. Common propensity score-based methods include regression adjustment and inverse probability of treatment weighting using the propensity score. We recently proposed a robust multiple imputation-based method, penalized spline of propensity for treatment comparisons (PENCOMP), that includes a penalized spline of the assignment propensity as a predictor. Under the Rubin causal model assumptions that there is no interference across units, that each unit has a non-zero probability of being assigned to either treatment group, and there are no unmeasured confounders, PENCOMP has a double robustness property for estimating treatment effects. In this study, we examine the impact of using variable selection techniques that restrict predictors in the propensity score model to true confounders of the treatment-outcome relationship on PENCOMP. We also propose a variant of PENCOMP and compare alternative approaches to standard error estimation for PENCOMP. Compared to the weighted estimators, PENCOMP is less affected by inclusion of non-confounding variables in the propensity score model. We illustrate the use of PENCOMP and competing methods in estimating the impact of antiretroviral treatments on CD4 counts in HIV+ patients.


Bayesian sensitivity analyses for longitudinal data with dropouts that are potentially missing not at random: A high dimensional pattern‐mixture mode

June 2021

·

19 Reads

·

4 Citations

Statistics in Medicine

Randomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time t k , conditional on the prior history, differs across the patterns of missing data. We then perform sensitivity analysis on estimates of the parameters of interest. The sensitivity parameters relate the distribution of the outcome of interest between subjects from a missing-data pattern at time t k with that of the observed subjects at time t k . The large number of the sensitivity parameters is reduced by treating them as random with a prior distribution having some pre-specified mean and variance, which are varied to explore the sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the proposed model, allowing a sensitivity analysis of deviations from MAR. The proposed approach is applied to data from the Trial of Preventing Hypertension.


A Case Study of Nonresponse Bias Analysis

April 2021

·

13 Reads

Nonresponse bias is a widely prevalent problem for data collections. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies, and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of relationship of the characteristics in the nonresponse adjustment with the key survey variables. Our NRBA incorporates missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.



A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome

January 2021

·

57 Reads

·

12 Citations

Statistics in Medicine

We consider comparative effectiveness research (CER) from observational data with two or more treatments. In observational studies, the estimation of causal effects is prone to bias due to confounders related to both treatment and outcome. Methods based on propensity scores are routinely used to correct for such confounding biases. A large fraction of propensity score methods in the current literature consider the case of either two treatments or continuous outcome. There has been extensive literature with multiple treatment and binary outcome, but interest often lies in the intersection, for which the literature is still evolving. The contribution of this article is to focus on this intersection and compare across methods, some of which are fairly recent. We describe propensity‐based methods when more than two treatments are being compared, and the outcome is binary. We assess the relative performance of these methods through a set of simulation studies. The methods are applied to assess the effect of four common therapies for castration‐resistant advanced‐stage prostate cancer. The data consist of medical and pharmacy claims from a large national private health insurance network, with the adverse outcome being admission to the emergency room within a short time window of treatment initiation.


Citations (51)


... , N is defined as w i = 1/π i . Design weights often are adjusted, for example, for survey nonresponse [SLMS23,SLMS24]. For now, we work only with design weights. ...

Reference:

Differentially Private Finite Population Estimation via Survey Weight Regularization
Nonresponse Bias Analysis in Longitudinal Studies: A Comparative Review with an Application to the Early Childhood Longitudinal Study
  • Citing Article
  • March 2024

International Statistical Review

... This indicates that respondents may have been more motivated to engage in nutrition education, which may have overestimated positive attitudes toward nutrition and the ability to learn and retain information. Future studies can address non-response bias using mandatory rather than voluntary participation [58] or collecting additional data from non-respondents to conduct non-response bias analyses [59,60]. ...

A Case Study of Nonresponse Bias Analysis in Educational Assessment Surveys
  • Citing Article
  • December 2022

Journal of Educational and Behavioral Statistics

... In the past years, several methods have been recommended for simplifying imputation models in large studies (e.g., Costantini, Lang, Reeskens, & Sijtsma, 2023;Si et al., 2023;Zhao & Long, 2016). The purpose of this article is to propose an alternative strategy that extends conventional imputation approaches with dimension reduction techniques such as partial least squares (PLS) regression (Wold et al., 2001; see also Hastie et al., 2009). ...

Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics
  • Citing Article
  • October 2021

Journal of Survey Statistics and Methodology

... To compute the index, we need a set of auxiliary variables describing the variable of interest that has to be available for all units in the target population. Little et al. (2019), Andridge et al. (2019) and Boonstra et al. (2021) evaluated the index in different simulation studies using artificially generated data as well as empirical data sets and could show a general good performance of the measure. However, for the index to properly detect selection bias, there are some requirements on the non-probability data and auxiliary variables that may not be available in real-world applications, or at least may not be testable. ...

A Simulation Study of Diagnostics for Selection Bias

Journal of Official Statistics

... Weighting the residuals by the inverse propensity is less efficient than PSPP. In simulations I have found PSPP to be similar to AIPW in terms of bias reduction, but potentially more efficient, particularly when the weights in AIPW are very variable (Zhou et al., 2019(Zhou et al., , 2021. ...

Robust Causal Estimation from Observational Studies Using Penalized Spline of Propensity Score for Treatment Comparison

Stats

... We will apply advanced techniques, such as "multiple imputations" [68], to impute the missing values. Our experienced team will perform sensitivity analyses to evaluate the robustness of the results under various missing-data scenarios [69]. ...

Bayesian sensitivity analyses for longitudinal data with dropouts that are potentially missing not at random: A high dimensional pattern‐mixture mode
  • Citing Article
  • June 2021

Statistics in Medicine

... In the missing-data-imputation framework, the TULA Questionnaire A data can be regarded as incomplete data, whereas Questionnaire B data are considered as complete data. Because the samples of Questionnaires A and B were selected at random, the missingness mechanisms should be missing completely at random (Little and Rubin, 2019). As shown later, the missing rates of our data (sample size of Questionnaire A divided by the sum of the sample sizes of Questionnaires A and B) can be computed to be approximately equal to 95 %. ...

Statistical Analysis with Missing Data
  • Citing Book
  • August 2014

... 2 Background: The Proxy Pattern-Mixture Model Andridge and Little (2011) first introduced the PPMM for assessing the potential for nonresponse bias for continuous outcomes in surveys. Since then, there have been several extensions of the methods to incorporate binary and skewed survey outcome cases (Andridge et al., 2019;Andridge and Little, 2020;Andridge and Thompson, 2015a,b) and to increase robustness (Yang and Little, 2021). This section contains a brief overview of the PPMM in the context of estimating means; we refer interested readers to the aforementioned references for more details. ...

Spline Pattern-Mixture Models for Missing Data
  • Citing Article
  • February 2021

Journal of data science: JDS

... Results after inverse probability weighting would be violated by extreme propensity scores. 35 Therefore, more weighting methods were recently introduced to avoid extreme propensity scores. The overlap weighting and matching weights were introduced by providing average treatment effects in the overlap population and subset, respectively. ...

A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome
  • Citing Article
  • January 2021

Statistics in Medicine

... In the context of missing data, Andridge andLittle (2011, 2020) proposed the PPMM as a measure of the potential impact of nonresponse on survey estimates for both continuous and binary outcome variables. Little et al. (2020) used the PPMM as the basis of indices that measure the degree of potential sampling bias arising from the use of nonprobability samples. Andridge et al. (2019) extended this approach to indices of potential nonignorable selection bias for estimates of population proportions. ...

Measures of the Degree of Departure from Ignorable Sample Selection
  • Citing Article
  • August 2019

Journal of Survey Statistics and Methodology