Article

Aggregating published prediction models with individual participant data: a comparison of different approaches.

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
Statistics in Medicine (Impact Factor: 2.04). 06/2012; 31(23):2697-712. DOI:10.1002/sim.5412
Source: PubMed

ABSTRACT During the recent decades, interest in prediction models has substantially increased, but approaches to synthesize evidence from previously developed models have failed to keep pace. This causes researchers to ignore potentially useful past evidence when developing a novel prediction model with individual participant data (IPD) from their population of interest. We aimed to evaluate approaches to aggregate previously published prediction models with new data. We consider the situation that models are reported in the literature with predictors similar to those available in an IPD dataset. We adopt a two-stage method and explore three approaches to calculate a synthesis model, hereby relying on the principles of multivariate meta-analysis. The former approach employs a naive pooling strategy, whereas the latter accounts for within-study and between-study covariance. These approaches are applied to a collection of 15 datasets of patients with traumatic brain injury, and to five previously published models for predicting deep venous thrombosis. Here, we illustrated how the generally unrealistic assumption of consistency in the availability of evidence across included studies can be relaxed. Results from the case studies demonstrate that aggregation yields prediction models with an improved discrimination and calibration in a vast majority of scenarios, and result in equivalent performance (compared with the standard approach) in a small minority of situations. The proposed aggregation approaches are particularly useful when few participant data are at hand. Assessing the degree of heterogeneity between IPD and literature findings remains crucial to determine the optimal approach in aggregating previous evidence into new prediction models. Copyright © 2012 John Wiley & Sons, Ltd.

0 0
 · 
1 Bookmark
 · 
178 Views
  • [show abstract] [hide abstract]
    ABSTRACT: Published clinical prediction models are often ignored during the development of novel prediction models despite similarities in populations and intended usage. The plethora of prediction models that arise from this practice may still perform poorly when applied in other populations. Incorporating prior evidence might improve the accuracy of prediction models and make them potentially better generalizable. Unfortunately, aggregation of prediction models is not straightforward, and methods to combine differently specified models are currently lacking. We propose two approaches for aggregating previously published prediction models when a validation dataset is available: model averaging and stacked regressions. These approaches yield user-friendly stand-alone models that are adjusted for the new validation data. Both approaches rely on weighting to account for model performance and between-study heterogeneity but adopt a different rationale (averaging versus combination) to combine the models. We illustrate their implementation in a clinical example and compare them with established methods for prediction modeling in a series of simulation studies. Results from the clinical datasets and simulation studies demonstrate that aggregation yields prediction models with better discrimination and calibration in a vast majority of scenarios, and results in equivalent performance (compared to developing a novel model from scratch) when validation datasets are relatively large. In conclusion, model aggregation is a promising strategy when several prediction models are available from the literature and a validation dataset is at hand. The aggregation methods do not require existing models to have similar predictors and can be applied when relatively few data are at hand. Copyright © 2014 John Wiley & Sons, Ltd.
    Statistics in Medicine 01/2014; · 2.04 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: A fundamental aspect of epidemiological studies concerns the estimation of factor-outcome associations to identify risk factors, prognostic factors and potential causal factors. Because reliable estimates for these associations are important, there is a growing interest in methods for combining the results from multiple studies in individual participant data meta-analyses (IPD-MA). When there is substantial heterogeneity across studies, various random-effects meta-analysis models are possible that employ a one-stage or two-stage method. These are generally thought to produce similar results, but empirical comparisons are few. We describe and compare several one- and two-stage random-effects IPD-MA methods for estimating factor-outcome associations from multiple risk-factor or predictor finding studies with a binary outcome. One-stage methods use the IPD of each study and meta-analyse using the exact binomial distribution, whereas two-stage methods reduce evidence to the aggregated level (e.g. odds ratios) and then meta-analyse assuming approximate normality. We compare the methods in an empirical dataset for unadjusted and adjusted risk-factor estimates. Though often similar, on occasion the one-stage and two-stage methods provide different parameter estimates and different conclusions. For example, the effect of erythema and its statistical significance was different for a one-stage (OR = 1.35, [Formula: see text]) and univariate two-stage (OR = 1.55, [Formula: see text]). Estimation issues can also arise: two-stage models suffer unstable estimates when zero cell counts occur and one-stage models do not always converge. When planning an IPD-MA, the choice and implementation (e.g. univariate or multivariate) of a one-stage or two-stage method should be prespecified in the protocol as occasionally they lead to different conclusions about which factors are associated with outcome. Though both approaches can suffer from estimation challenges, we recommend employing the one-stage method, as it uses a more exact statistical approach and accounts for parameter correlation.
    PLoS ONE 01/2013; 8(4):e60650. · 3.73 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: The use of individual participant data (IPD) from multiple studies is an increasingly popular approach when developing a multivariable risk prediction model. Corresponding datasets, however, typically differ in important aspects, such as baseline risk. This has driven the adoption of meta-analytical approaches for appropriately dealing with heterogeneity between study populations. Although these approaches provide an averaged prediction model across all studies, little guidance exists about how to apply or validate this model to new individuals or study populations outside the derivation data. We consider several approaches to develop a multivariable logistic regression model from an IPD meta-analysis (IPD-MA) with potential between-study heterogeneity. We also propose strategies for choosing a valid model intercept for when the model is to be validated or applied to new individuals or study populations. These strategies can be implemented by the IPD-MA developers or future model validators. Finally, we show how model generalizability can be evaluated when external validation data are lacking using internal-external cross-validation and extend our framework to count and time-to-event data. In an empirical evaluation, our results show how stratified estimation allows study-specific model intercepts, which can then inform the intercept to be used when applying the model in practice, even to a population not represented by included studies. In summary, our framework allows the development (through stratified estimation), implementation in new individuals (through focused intercept choice), and evaluation (through internal-external validation) of a single, integrated prediction model from an IPD-MA in order to achieve improved model performance and generalizability. Copyright © 2013 John Wiley & Sons, Ltd.
    Statistics in Medicine 01/2013; · 2.04 Impact Factor

Full-text

View
67 Downloads
Available from
Sep 10, 2012