Conditioning on the propensity score can result in biased estimation of common measures of treatment effect: a Monte Carlo study

Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada.
Statistics in Medicine (Impact Factor: 2.04). 02/2007; 26(4):754-68. DOI: 10.1002/sim.2618
Source: PubMed

ABSTRACT Propensity score methods are increasingly being used to estimate causal treatment effects in the medical literature. Conditioning on the propensity score results in unbiased estimation of the expected difference in observed responses to two treatments. The degree to which conditioning on the propensity score introduces bias into the estimation of the conditional odds ratio or conditional hazard ratio, which are frequently used as measures of treatment effect in observational studies, has not been extensively studied. We conducted Monte Carlo simulations to determine the degree to which propensity score matching, stratification on the quintiles of the propensity score, and covariate adjustment using the propensity score result in biased estimation of conditional odds ratios, hazard ratios, and rate ratios. We found that conditioning on the propensity score resulted in biased estimation of the true conditional odds ratio and the true conditional hazard ratio. In all scenarios examined, treatment effects were biased towards the null treatment effect. However, conditioning on the propensity score did not result in biased estimation of the true conditional rate ratio. In contrast, conventional regression methods allowed unbiased estimation of the true conditional treatment effect when all variables associated with the outcome were included in the regression model. The observed bias in propensity score methods is due to the fact that regression models allow one to estimate conditional treatment effects, whereas propensity score methods allow one to estimate marginal treatment effects. In several settings with non-linear treatment effects, marginal and conditional treatment effects do not coincide.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Organizational and applied sciences have long struggled with improving causal inference in quasi‐experiments. We introduce organizational researchers to propensity scoring, a statistical technique that has become popular in other applied sciences as a means for improving internal validity. Propensity scoring statistically models how individuals in a quasi‐experiment have been assigned to conditions in order to estimate treatment effects among individuals with approximately equal probabilities of receiving the treatment. If propensity scores are created from relevant covariates, matching on the propensity score makes treatment assignment ignorable and approximates a true experimental design. We illustrate how matching on the propensity score can be applied by using 2 examples: examining the effects of online instruction and estimating the benefits of preparatory coaching for the SAT. In both cases, propensity‐scoring methods effectively reduced inequivalence between treatment and control groups on many variables. Propensity scoring stands out as a valuable technique capable of improving causal inference from many of organizational research's quasi‐experiments.
    Personnel Psychology 06/2013; 66(2). DOI:10.1111/peps.12020 · 2.93 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The discovery and development of new antimicrobials is critically important especially as multi-drug resistant bacteria continue to emerge. Little has been written about the epidemiological issues in non-randomized trials aiming to evaluate the superiority of one antibiotic over another. In this manuscript, we outline some of the methodological difficulties in demonstrating superiority and discuss potential approaches to these problems. Many of the difficulties arise due to confounding by indication which we define and explain. Epidemiological methods including restriction, matching, stratification, multivariable regression, propensity scores and instrumental variables are discussed.
    Clinical Infectious Diseases 06/2014; 59(8). DOI:10.1093/cid/ciu486 · 9.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Propensity-score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity-score matching. We propose two different bootstrap methods for use when using propensity-score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity-score-matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects. © 2014 The Authors Statistics in Medicine Published by John Wiley & Sons, Ltd.
    Statistics in Medicine 10/2014; 33(24). DOI:10.1002/sim.6276 · 2.04 Impact Factor


Available from