Project

Prediction calibration using multiple imputations to account for missing predictor values

Goal: In this project we study application of multiple imputation to account for the presence of missing values in predictors, when the objective is to calibrate a predictor for use on future novel observations. To implement the methods, we are constructing an R package "mipred" which allows users to calibrate prediction rules using multiple imputations to account for missing predictor values.

The mipred package will support both calibration using both generalized linear models and Cox regression modeling for censored life time outcomes. The approach is described for binary outcomes by Mertens, Banzato and de Wreede (2019) (https://arxiv.org/abs/1810.05099). Imputations are generated using the package mice without using outcomes on observations for which the prediction is generated. Two options are provided to generate predictions. The first is prediction-averaging of predictions calibrated from single models fitted on single imputed datasets within a set of multiple imputations. The second is application of the Rubin's rules pooled model. For both implementations, unobserved values in the predictor data of new observations for which the predictions are derived are automatically imputed. The present version of the package is preliminary (development) and has been checked to only support binary-outcome logistic regression. We are working to expand the functionality to non-binary and survival outcomes.

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
3
Reads
0 new
51

Project log

Bart J A Mertens
added a research item
In this paper, we expand the methodology presented in Mertens et. al (2020, Biometrical Journal) to the study of life-time (survival) outcome which is subject to censoring and when imputation is used to account for missing values. We consider the problem where missing values can occur in both the calibration data as well as newly - to-be-predicted - observations (validation). We focus on the Cox model. Methods are described to combine imputation with predictive calibration in survival modeling subject to censoring. Application to cross-validation is discussed. We demonstrate how conclusions broadly confirm the first paper which restricted to the study of binary outcomes only. Specifically prediction-averaging appears to have superior statistical properties, especially smaller predictive variation, as opposed to a direct application of Rubin's rules. Distinct methods for dealing with the baseline hazards are discussed when using Rubin's rules-based approaches.
Bart J A Mertens
added a research item
The package `mipred` contains two basic functions. The first is `mipred.cv`, which estimates cross-validated predictions when predictors contain missing values and using multiple imputation. The second is `mipred`, which allows users to apply the same methodology to predict outcome for novel observations, based on past calibration data. Both the new observations, as well as the calibration data may contain mising observations in the predictor data. This document describes data analysis approaches using the `mipred` package functions for the above objectives. We first discuss cross-validation of prediction with `mipred.cv`, using both the `averaging` and `rubin` methods as described in the paper by Mertens et al (see research gate) to estimate the expected prediction performance on future data. We subsequently describe use of the `mipred` function to estimated predictions on new patient data, based on past data. Finally, `mipred` package functionality and options are discussed.
Bart J A Mertens
added 2 research items
The latest version of the software can be downloaded from GitHub: https://github.com/BartJAMertens/mipred. And we are happy to say that the package is now on CRAN, so you may as well install directly from there.
We investigate the problem of calibration and assessment of predictive rules in prognostic designs when missing values are present in the predictors. Our paper has two key objectives which are entwined. The first is to investigate how the calibration of the prediction rule can be combined with the use of multiple imputation to account for missing predictor observations. The second objective is to propose such methods that can be implemented with current multiple imputation software, while allowing for unbiased predictive assessment through validation on new observations for which outcome is not yet available. To inform the definition of methodology, we commence with a review of the theoretical background of multiple imputation as a model estimation approach as opposed to a purely algorithmic description. We specifically contrast application of multiple imputation for parameter (effect) estimation with predictive calibration. Based on this review, two approaches are formulated, of which the second utilizes application of the classical Rubin's rules for parameter estimation, while the first approach averages probabilities from models fitted on single imputations to directly approximate the predictive density for future observations. We present implementations using current software which allow for validatory or cross-validatory estimation of performance measures, as well as imputation of missing data in predictors on the future data where outcome is by definition as yet unobserved. We restrict discussion to binary outcome and logistic regression throughout, though the principles discussed are generally applicable. We present two data sets as examples from our regular consultative practice. Results show little difference between methods for accuracy but substantial reductions in variation of calibrated probabilities when using the first approach.
Bart J A Mertens
added a project goal
In this project we study application of multiple imputation to account for the presence of missing values in predictors, when the objective is to calibrate a predictor for use on future novel observations. To implement the methods, we are constructing an R package "mipred" which allows users to calibrate prediction rules using multiple imputations to account for missing predictor values.
The mipred package will support both calibration using both generalized linear models and Cox regression modeling for censored life time outcomes. The approach is described for binary outcomes by Mertens, Banzato and de Wreede (2019) (https://arxiv.org/abs/1810.05099). Imputations are generated using the package mice without using outcomes on observations for which the prediction is generated. Two options are provided to generate predictions. The first is prediction-averaging of predictions calibrated from single models fitted on single imputed datasets within a set of multiple imputations. The second is application of the Rubin's rules pooled model. For both implementations, unobserved values in the predictor data of new observations for which the predictions are derived are automatically imputed. The present version of the package is preliminary (development) and has been checked to only support binary-outcome logistic regression. We are working to expand the functionality to non-binary and survival outcomes.