Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, UK.
BMC Medical Research Methodology (Impact Factor: 2.17). 04/2012; 12:46. DOI: 10.1186/1471-2288-12-46
Source: PubMed

ABSTRACT Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality.
We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study.
JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias.
Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Missing data are a frequent problem in cost-effectiveness analysis (CEA) within a randomised controlled trial. Inappropriate methods to handle missing data can lead to misleading results and ultimately can affect the decision of whether an intervention is good value for money. This article provides practical guidance on how to handle missing data in within-trial CEAs following a principled approach: (i) the analysis should be based on a plausible assumption for the missing data mechanism, i.e. whether the probability that data are missing is independent of or dependent on the observed and/or unobserved values; (ii) the method chosen for the base-case should fit with the assumed mechanism; and (iii) sensitivity analysis should be conducted to explore to what extent the results change with the assumption made. This approach is implemented in three stages, which are described in detail: (1) descriptive analysis to inform the assumption on the missing data mechanism; (2) how to choose between alternative methods given their underlying assumptions; and (3) methods for sensitivity analysis. The case study illustrates how to apply this approach in practice, including software code. The article concludes with recommendations for practice and suggestions for future research. Electronic supplementary material The online version of this article (doi:10.1007/s40273-014-0193-3) contains supplementary material, which is available to authorized users.
    PharmacoEconomics 07/2014; 32(12). DOI:10.1007/s40273-014-0193-3 · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The evidence for choices between antipsychotics for children and adolescents with schizophrenia and other psychotic disorders is limited. The main objective of the Tolerability and Efficacy of Antipsychotics (TEA) trial is to compare the benefits and harms of quetiapine versus aripiprazole in children and adolescents with psychosis in order to inform rational, effective and safe treatment selections. Methods/Design The TEA trial is a Danish investigator-initiated, independently funded, multi-centre, randomised, blinded clinical trial. Based on sample size estimation, 112 patients aged 12-17 years with psychosis, antipsychotic-naïve or treated for a limited period are, 1:1 randomised to a 12- week, double-blind intervention with quetiapine versus aripiprazole. Effects on psychopathology, cognition, health-related quality of life, and adverse events are assessed 2, 4, and 12 weeks after randomisation. The primary outcome is change in the positive symptom score of the Positive and Negative Syndrome Scale. The recruitment period is 2010-2014. Discussion Antipsychotics are currently the only available pharmacologic treatments for psychotic disorders. However, information about head-to-head differences in efficacy and tolerability of antipsychotics are scarce in children and adolescents. The TEA trial aims at expanding the evidence base for the use of antipsychotics in early onset psychosis in order to inform more rational treatment decisions in this vulnerable population. Here, we account for the trial design, address methodological challenges, and discuss the estimation of sample size. Trial registration NCT01119014
    BMC Psychiatry 07/2014; 14(1):199. DOI:10.1186/1471-244X-14-199 · 2.24 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is essential for research funding organizations to ensure both the validity and fairness of the grant approval procedure. The ex-ante peer evaluation (EXANTE) of N = 8,496 grant applications submitted to the Austrian Science Fund from 1999 to 2009 was statistically analyzed. For 1,689 funded research projects an ex-post peer evaluation (EXPOST) was also available; for the rest of the grant applications a multilevel missing data imputation approach was used to consider verification bias for the first time in peer-review research. Without imputation, the predictive validity of EXANTE was low (r = .26) but underestimated due to verification bias, and with imputation it was r = .49. That is, the decision-making procedure is capable of selecting the best research proposals for funding. In the EXANTE there were several potential biases (e.g., gender). With respect to the EXPOST there was only one real bias (discipline-specific and year-specific differential prediction). The novelty of this contribution is, first, the combining of theoretical concepts of validity and fairness with a missing data imputation approach to correct for verification bias and, second, multilevel modeling to test peer review-based funding decisions for both validity and fairness in terms of potential and real biases.
    Journal of the Association for Information Science and Technology 08/2014; DOI:10.1002/asi.23315 · 2.23 Impact Factor


Available from