Drug safety data mining with a tree-based scan statistic

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
Pharmacoepidemiology and Drug Safety (Impact Factor: 3.17). 05/2013; DOI: 10.1002/pds.3423
Source: PubMed

ABSTRACT PURPOSE: In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. METHODS: We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. RESULTS: Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. CONCLUSION: The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.

1 Follower
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective: Several disproportionality analysis methods are widely used for signal detection. The goal of this study was to compare the concordance of the performance characteristics of these methods in spontaneous reporting system of China. Methods: Algorithms including reporting odds ratio (ROR), proportional reporting ratio (PRR) and information component (IC), a composite criterion previously used by Medicines and Healthcare Products Regulatory Agency (MHRA) were compared. Kappa coefficient was used as the gauge to test the concordance. Reports received in the year 2004 and 2005 were extracted for analysis in this study. Results: After data processing, 361,872 reports representing 52,769 combinations were analysed. The analysis generated 24,022, 22,646, 5637 and 5302 signals of disproportionality by PRR, ROR, MHRA and IC, respectively. The kappa coefficient increased with the threshold of number of drug-adverse drug reactions (ADR) combination, and the coefficient exceeded 0.7 when the number of suspected drug-ADR exceeded 2. Conclusion: This study shows that different measures used are broadly comparable in spontaneous reporting system in China when two or more cases per combination have been collected.
    Expert Opinion on Drug Safety 06/2014; DOI:10.1517/14740338.2014.915938 · 2.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A limited amount of data is typically available to support product license applications. That is further complicated by the need to make some medicines available to patients at key times, using expedited drug approval pathways. In addition, increasing immunogenicity concerns have been paralleled by a myriad of biotherapeutics entering development and/or receiving regulatory approval. Postmarketing patient safety is intrinsically dependent on the correct balance of economics, regulatory oversight and legal and enforcement issues. Here, we discuss the changing landscape of pharmacovigilance, with special emphasis on postmarketing commitments and requirements, megadata analysis, regulatory responsibilities and research opportunities. Challenges and possibilities are illustrated with therapeutic drugs approved for treatment of autoimmune diseases, diabetes, cancer, rare diseases and the resurgence of gene therapy.
    Drug Discovery Today 07/2014; 19(12). DOI:10.1016/j.drudis.2014.07.011 · 5.96 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large healthcare databases maintained by health plans have been widely used to conduct customized protocol-based epidemiological safety studies as well as targeted routine sequential monitoring of suspected adverse events for newly licensed vaccines. These databases also offer a rich data source to discover vaccine-related adverse events not known prior to licensure using data mining methods, but they remain relatively under-utilized for this purpose. Initial safety applications of data mining methods using ‘big healthcare data’ are promising, but stronger integration of database expertize, epidemiological design, and statistical analysis strategies are needed to better leverage the available information, reduce bias, and improve reporting transparency. We enumerate major methodological challenges in mining large healthcare databases for vaccine safety research, describe existing strategies that have been used to address these issues, and identify opportunities for methodological advancements that emphasize the importance of adapting techniques used in customized protocol-based vaccine safety assessments. Investment in such research methods and in the development of deeper collaborations between database safety experts and data mining methodologists has great potential to improve existing safety surveillance programs and further increase public confidence in the safety of newly licensed vaccines.
    Statistical Analysis and Data Mining 10/2014; 7(5). DOI:10.1002/sam.11232


Available from
Mar 27, 2015