A Pilot Study Using Machine Learning and Domain Knowledge to Facilitate Comparative Effectiveness Review Updating

Southern California Evidence-based Practice Center, RAND Corporation, Santa Monica, CA (SRD, PGS, SH, SJN, AM, KDS).
Medical Decision Making (Impact Factor: 3.24). 09/2012; 33(3). DOI: 10.1177/0272989X12457243
Source: PubMed


BACKGROUND: Comparative effectiveness and systematic reviews require frequent and time-consuming updating. RESULTS: of earlier screening should be useful in reducing the effort needed to screen relevant articles. METHODS: We collected 16,707 PubMed citation classification decisions from 2 comparative effectiveness reviews: interventions to prevent fractures in low bone density (LBD) and off-label uses of atypical antipsychotic drugs (AAP). We used previously written search strategies to guide extraction of a limited number of explanatory variables pertaining to the intervention, outcome, and STUDY DESIGN: We empirically derived statistical models (based on a sparse generalized linear model with convex penalties [GLMnet] and a gradient boosting machine [GBM]) that predicted article relevance. We evaluated model sensitivity, positive predictive value (PPV), and screening workload reductions using 11,003 PubMed citations retrieved for the LBD and AAP updates. Results. GLMnet-based models performed slightly better than GBM-based models. When attempting to maximize sensitivity for all relevant articles, GLMnet-based models achieved high sensitivities (0.99 and 1.0 for AAP and LBD, respectively) while reducing projected screening by 55.4% and 63.2%. The GLMnet-based model yielded sensitivities of 0.921 and 0.905 and PPVs of 0.185 and 0.102 when predicting articles relevant to the AAP and LBD efficacy/effectiveness analyses, respectively (using a threshold of P ≥ 0.02). GLMnet performed better when identifying adverse effect relevant articles for the AAP review (sensitivity = 0.981) than for the LBD review (0.685). The system currently requires MEDLINE-indexed articles. CONCLUSIONS: We evaluated statistical classifiers that used previous classification decisions and explanatory variables derived from MEDLINE indexing terms to predict inclusion decisions. This pilot system reduced workload associated with screening 2 simulated comparative effectiveness review updates by more than 50% with minimal loss of relevant articles.

Download full-text


Available from: Susanne Hempel, Apr 29, 2014

  • No preview · Article · Apr 2013 · Medical Decision Making
  • [Show abstract] [Hide abstract]
    ABSTRACT: Incentives offered by the U.S. government have spurred marked increases in use of health information technology (IT). To update previous reviews and examine recent evidence that relates health IT functionalities prescribed in meaningful use regulations to key aspects of health care. English-language articles in PubMed from January 2010 to August 2013. 236 studies, including pre-post and time-series designs and clinical trials that related the use of health IT to quality, safety, or efficiency. Two independent reviewers extracted data on functionality, study outcomes, and context. Fifty-seven percent of the 236 studies evaluated clinical decision support and computerized provider order entry, whereas other meaningful use functionalities were rarely evaluated. Fifty-six percent of studies reported uniformly positive results, and an additional 21% reported mixed-positive effects. Reporting of context and implementation details was poor, and 61% of studies did not report any contextual details beyond basic information. Potential for publication bias, and evaluated health IT systems and outcomes were heterogeneous and incompletely described. Strong evidence supports the use of clinical decision support and computerized provider order entry. However, insufficient reporting of implementation and context of use makes it impossible to determine why some health IT implementations are successful and others are not. The most important improvement that can be made in health IT evaluations is increased reporting of the effects of implementation and context. Office of the National Coordinator.
    No preview · Article · Jan 2014 · Annals of internal medicine
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance.
    Full-text · Article · Jan 2014 · PLoS ONE
Show more