The role of the c-statistic in variable selection for propensity score models

Department of Obstetrics and Gynecology and Duke Global Health Institute, Duke University, Durham, NC, USA.
Pharmacoepidemiology and Drug Safety (Impact Factor: 3.17). 03/2011; 20(3):317-20. DOI: 10.1002/pds.2074
Source: PubMed

ABSTRACT The applied literature on propensity scores has often cited the c-statistic as a measure of the ability of the propensity score to control confounding. However, a high c-statistic in the propensity model is neither necessary nor sufficient for control of confounding. Moreover, use of the c-statistic as a guide in constructing propensity scores may result in less overlap in propensity scores between treated and untreated subjects; this may require the analyst to restrict populations for inference. Such restrictions may reduce precision of estimates and change the population to which the estimate applies. Variable selection based on prior subject matter knowledge, empirical observation, and sensitivity analysis is preferable and avoids many of these problems.

Download full-text


Available from: Daniel Westreich, Sep 21, 2014
1 Follower
  • Source
    • " statistic , Hosmer – Lemeshow statistic , or any other measure of goodness - of - fit to select variables for inclusion in our models for the weights because doing so can lead to bias ( from unbalanced confounders or balanced nonconfounders including instrumental variables ) , reduced precision , nonpositivity , and / or restricted infer - ence ( Westreich et al . 2011 ) . To informally assess the bias – variance tradeoff ( Winer 1978 ) , we progressively truncated the overall stabilized weights by resetting weights less ( or greater ) than a certain percentile to the value of that percentile ( Cole and Hernán 2008 ) . Regarding the ORs derived from the untruncated weights as the " true " values , we "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pesticide exposure may be positively associated with depression. Few previous studies considered the episodic nature of depression or examined individual pesticides.
    Environmental Health Perspectives 06/2014; 122(9). DOI:10.1289/ehp.1307450 · 7.03 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We investigated whether neighborhood socioeconomic characteristics, measured within person-centered areas (ie, centered on individuals' residences) are associated with body mass index (BMI [kg/m²]) and waist circumference. We used propensity-score matching as a diagnostic and validation tool to examine whether socio-spatial segregation (and related structural confounding) allowed us to estimate neighborhood socioeconomic effects adjusted for individual socioeconomic characteristics without excessive model extrapolations. Using the RECORD (Residential Environment and CORonary heart Disease) Cohort Study, we conducted cross-sectional analyses of 7230 adults from the Paris region. We first estimated the relationships of 3 neighborhood socioeconomic indicators (education, income, real estate prices) with BMI and waist circumference using traditional multilevel regression models adjusted for individual covariates. Second, we examined whether these associations persisted when estimated among participants exchangeable based on their probability of living in low-socioeconomic-status neighborhoods (propensity-score matched samples). After adjustment for covariates, BMI/waist circumference increased with decreasing neighborhood socioeconomic status, especially with neighborhood education measured within 500-m radius buffers around residences; associations were stronger for women. With propensity-score matching techniques, there was some overlap in the odds of exposure between exposed and unexposed populations. As a function of socio-spatial segregation and an indicator of whether the data support inferences, sample size decreased by 17%-59% from the initial to the propensity-score matched samples. Propensity-score matched models confirmed relationships obtained from models in the entire sample. Overall, adjusted associations between neighborhood socioeconomic variables and BMI/waist circumference were empirically estimable in the French context, without excessive model extrapolations, despite the extent of socio-spatial segregation.
    Epidemiology (Cambridge, Mass.) 06/2011; 22(5):694-703. DOI:10.1097/EDE.0b013e3182257784 · 6.18 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Propensity score (PS) methods aim to control for confounding by balancing confounders between exposed and unexposed subjects with the same PS. PS balance measures have been compared in simulated data but limited in empirical data. Our objective was to compare balance measures in clinical data and assessed the association between long-acting inhalation beta-agonist (LABA) use and myocardial infarction. We estimated the relationship between LABA use and myocardial infarction in a cohort of adults with a diagnosis of asthma or chronic obstructive pulmonary disorder from the Utrecht General Practitioner Research Network database. More than two thousand PS models, including information on the observed confounders age, sex, diabetes, cardiovascular disease and chronic obstructive pulmonary disorder status, were applied. The balance of these confounders was assessed using the standardised difference (SD), Kolmogorov-Smirnov (KS) distance and overlapping coefficient. Correlations between these balance measures were calculated. In addition, simulation studies were performed to assess the correlation between balance measures and bias. LABA use was not related to myocardial infarction after conditioning on the PS (median heart rate = 1.14, 95%CI = 0.47-2.75). When using the different balance measures for selecting a PS model, similar associations were obtained. In our empirical data, SD and KS distance were highly correlated balance measures (r = 0.92). In simulations, SD, KS distance and overlapping coefficient were similarly correlated to bias (e.g. r = 0.55, r = 0.52 and r = -0.57, respectively, when conditioning on the PS). We recommend using the SD or the KS distance to quantify the balance of confounder distributions when applying PS methods.
    Pharmacoepidemiology and Drug Safety 11/2011; 20(11):1130-7. DOI:10.1002/pds.2251 · 3.17 Impact Factor