Improving propensity score weighting using machine learning

Department of Epidemiology and Biostatistics, Drexel University School of Public Health, Philadelphia, PA 19102, U.S.A.
Statistics in Medicine (Impact Factor: 2.04). 01/2009; 29(3):337-46. DOI: 10.1002/sim.3782
Source: PubMed

ABSTRACT Machine learning techniques such as classification and regression trees (CART) have been suggested as promising alternatives to logistic regression for the estimation of propensity scores. The authors examined the performance of various CART-based propensity score models using simulated data. Hypothetical studies of varying sample sizes (n=500, 1000, 2000) with a binary exposure, continuous outcome, and 10 covariates were simulated under seven scenarios differing by degree of non-linear and non-additive associations between covariates and the exposure. Propensity score weights were estimated using logistic regression (all main effects), CART, pruned CART, and the ensemble methods of bagged CART, random forests, and boosted CART. Performance metrics included covariate balance, standard error, per cent absolute bias, and 95 per cent confidence interval (CI) coverage. All methods displayed generally acceptable performance under conditions of either non-linearity or non-additivity alone. However, under conditions of both moderate non-additivity and moderate non-linearity, logistic regression had subpar performance, whereas ensemble methods provided substantially better bias reduction and more consistent 95 per cent CI coverage. The results suggest that ensemble methods, especially boosted CART, may be useful for propensity score weighting.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Inhaled corticosteroid/long-acting β2-agonist combinations (ICS/LABA) have emerged as first line therapies for chronic obstructive pulmonary disease (COPD) patients with exacerbation history. No randomized clinical trial has compared exacerbation rates among COPD patients receiving budesonide/formoterol combination (BFC) and fluticasone/salmeterol combination (FSC) to date, and only limited comparative data are available. This study compared the real-world effectiveness of approved BFC and FSC treatments among matched cohorts of COPD patients in a large US managed care setting. COPD patients (≥40 years) naive to ICS/LABA who initiated BFC or FSC treatments between 03/01/2009-03/31/2012 were identified in a geographically diverse US managed care database and followed for 12 months; index date was defined as first prescription fill date. Patients with a cancer diagnosis or chronic (≥180 days) oral corticosteroid (OCS) use within 12 months prior to index were excluded. Patients were matched 1-to-1 on demographic and pre-initiation clinical characteristics using propensity scores from a random forest model. The primary efficacy outcome was COPD exacerbation rate, and secondary efficacy outcomes included exacerbation rates by event type and healthcare resource utilization. Pneumonia objectives included rates of any diagnosis of pneumonia and pneumonia-related healthcare resource utilization. Matching of the identified 3,788 BFC and 6,439 FSC patients resulted in 3,697 patients in each group. Matched patients were well balanced on age (mean = 64 years), gender (BFC: 52% female; FSC: 54%), prior COPD-related medication use, healthcare utilization, and comorbid conditions. During follow-up, no significant difference was seen between BFC and FSC patients for number of COPD-related exacerbations overall (rate ratio [RR] = 1.02, 95% CI = [0.96,1.09], p = 0.56) or by event type: COPD-related hospitalizations (RR = 0.96), COPD-related ED visits (RR = 1.11), and COPD-related office/outpatient visits with OCS and/or antibiotic use (RR = 1.01). The proportion of patients diagnosed with pneumonia during the post-index period was similar for patients in each group (BFC = 17.3%, FSC = 19.0%, odds ratio = 0.92 [0.81,1.04], p = 0.19), and no difference was detected for pneumonia-related healthcare utilization by place of service. This study demonstrated no difference in COPD-related exacerbations or pneumonia events between BFC and FSC treatment groups for patients new to ICS/LABA treatment in a real-world setting. identifier NCT01921127 .
    Respiratory research 04/2015; 16(1):52. DOI:10.1186/s12931-015-0210-x · 3.38 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Most epidemiology textbooks that discuss models are vague on details of model selection. This lack of detail may be understandable since selection should be strongly influenced by features of the particular study, including contextual (prior) information about covariates that may confound, modify, or mediate the effect under study. It is thus important that authors document their modeling goals and strategies and understand the contextual interpretation of model parameters and model selection criteria. To illustrate this point, we review several established strategies for selecting model covariates, describe their shortcomings, and point to refinements, assuming that the main goal is to derive the most accurate effect estimates obtainable from the data and available resources. This goal shifts the focus to prediction of exposure or potential outcomes (or both) to adjust for confounding; it thus differs from the goal of ordinary statistical modeling, which is to passively predict outcomes. Nonetheless, methods and software for passive prediction can be used for causal inference as well, provided that the target parameters are shifted appropriately.
    Annual Review of Public Health 03/2015; 36:89-108. DOI:10.1146/annurev-publhealth-031914-122559 · 6.63 Impact Factor
  • Epidemiology (Cambridge, Mass.) 03/2015; 26(2):e14-5. DOI:10.1097/EDE.0000000000000237 · 6.18 Impact Factor


Available from
Jun 5, 2014