Aggregating published prediction models with individual participant data: A comparison of different approaches

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
Statistics in Medicine (Impact Factor: 1.83). 06/2012; 31(23):2697-712. DOI: 10.1002/sim.5412
Source: PubMed


During the recent decades, interest in prediction models has substantially increased, but approaches to synthesize evidence from previously developed models have failed to keep pace. This causes researchers to ignore potentially useful past evidence when developing a novel prediction model with individual participant data (IPD) from their population of interest. We aimed to evaluate approaches to aggregate previously published prediction models with new data. We consider the situation that models are reported in the literature with predictors similar to those available in an IPD dataset. We adopt a two-stage method and explore three approaches to calculate a synthesis model, hereby relying on the principles of multivariate meta-analysis. The former approach employs a naive pooling strategy, whereas the latter accounts for within-study and between-study covariance. These approaches are applied to a collection of 15 datasets of patients with traumatic brain injury, and to five previously published models for predicting deep venous thrombosis. Here, we illustrated how the generally unrealistic assumption of consistency in the availability of evidence across included studies can be relaxed. Results from the case studies demonstrate that aggregation yields prediction models with an improved discrimination and calibration in a vast majority of scenarios, and result in equivalent performance (compared with the standard approach) in a small minority of situations. The proposed aggregation approaches are particularly useful when few participant data are at hand. Assessing the degree of heterogeneity between IPD and literature findings remains crucial to determine the optimal approach in aggregating previous evidence into new prediction models. Copyright © 2012 John Wiley & Sons, Ltd.

Download full-text


Available from: Thomas P A Debray, Oct 10, 2015
28 Reads
  • Source
    • "Journal editors and peer reviewers can also play a role by demanding clear rationale and evidence for the need of a new prediction model and place more emphasis on studies evaluating prediction models. Recently, developments have been made that combine existing prediction models, thereby improving the generalisability, but importantly not wasting existing research [60,61]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Before considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models. We conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures. 11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models. The vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data.
    BMC Medical Research Methodology 03/2014; 14(1):40. DOI:10.1186/1471-2288-14-40 · 2.27 Impact Factor
  • Source
    • "The synthesis process of associations from the literature should then account for differences in model specification and included associations. Future research will investigate how these challenges can be assessed [36]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Diagnostic and prognostic literature is overwhelmed with studies reporting univariable predictor-outcome associations. Currently, methods to incorporate such information in the construction of a prediction model are underdeveloped and unfamiliar to many researchers. Methods This article aims to improve upon an adaptation method originally proposed by Greenland (1987) and Steyerberg (2000) to incorporate previously published univariable associations in the construction of a novel prediction model. The proposed method improves upon the variance estimation component by reconfiguring the adaptation process in established theory and making it more robust. Different variants of the proposed method were tested in a simulation study, where performance was measured by comparing estimated associations with their predefined values according to the Mean Squared Error and coverage of the 90% confidence intervals. Results Results demonstrate that performance of estimated multivariable associations considerably improves for small datasets where external evidence is included. Although the error of estimated associations decreases with increasing amount of individual participant data, it does not disappear completely, even in very large datasets. Conclusions The proposed method to aggregate previously published univariable associations with individual participant data in the construction of a novel prediction models outperforms established approaches and is especially worthwhile when relatively limited individual participant data are available.
    BMC Medical Research Methodology 08/2012; 12(1):121. DOI:10.1186/1471-2288-12-121 · 2.27 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this report is to describe the design and content of the International Mission for Prognosis And Clinical Trial (IMPACT) database of traumatic brain injury which contains the complete dataset from most clinical trials and organized epidemiologic studies conducted over the past 20 years. This effort, funded by the U.S. National Institutes of Health, has led to the accumulation thus far of data from 9205 patients with severe and moderate brain injuries from eight randomized placebo controlled trials and three observational studies. Data relevant to the design and analysis of pragmatic Phase III clinical trials, including pre-hospital, admission, and post-resuscitation assessments, information on the acute management, and short- and long-term outcome were merged into a top priority data set (TPDS). The major emphasis during the first phase of study is on information from time of injury to post-resuscitation and outcome at 6 months thereby providing a unique resource for prognostic analysis and for studies aimed at optimizing the design and analysis of Phase III trials in traumatic brain injury.
    Journal of Neurotrauma 03/2007; 24(2):239-50. DOI:10.1089/neu.2006.0036 · 3.71 Impact Factor
Show more