Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection
ABSTRACT The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload.
We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N(pos)=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations.
Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed.
Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively).
Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.
- SourceAvailable from: Marianthi Markatou
[Show abstract] [Hide abstract]
- "Weights can also be applied to the variables. The Brighton Collaboration case definition has been implemented as a rule-based program and using features representing the clinical concepts extracted from narrative case descriptions, this program has worked reasonably well to classify cases  "
ABSTRACT: Safety of medical products is a major public health concern. We present a critical discussion of the currently used analytical tools for mining spontaneous reporting systems (SRS) to identify safety signals after use of medical products. We introduce a pattern discovery framework for the analysis of SRS. The terminology ‘pattern discovery’ is borrowed from the engineering and artificial intelligence literature and signifies that the basis of the proposed framework is the medical case, formalizing the cognitive paradigm known to clinicians who evaluate individual patients and individual case safety reports submitted to SRS. The fundamental contribution of this approach is a strong probabilistic component that may account for selection and other biases and facilitates rigorous modeling and inference. We discuss somewhat in depth the concept of signal in pharmacovigilance and connect it with the concept of a pattern; we illustrate this conceptual framework using the example of anaphylaxis. Finally, we propose a research agenda in statistics, informatics, and pharmacovigilance practices needed to advance the pattern discovery framework in both the short and long terms.Statistical Analysis and Data Mining 10/2014; 7(5). DOI:10.1002/sam.11233
- [Show abstract] [Hide abstract]
ABSTRACT: The Decade of Vaccines Collaboration and development of the Global Vaccine Action Plan provides a catalyst and unique opportunity for regulators worldwide to develop and propose a global regulatory science agenda for vaccines. Regulatory oversight is critical to allow access to vaccines that are safe, effective, and of assured quality. Methods used by regulators need to constantly evolve so that scientific and technological advances are applied to address challenges such as new products and technologies, and also to provide an increased understanding of benefits and risks of existing products. Regulatory science builds on high-quality basic research, and encompasses at least two broad categories. First, there is laboratory-based regulatory science. Illustrative examples include development of correlates of immunity; or correlates of safety; or of improved product characterization and potency assays. Included in such science would be tools to standardize assays used for regulatory purposes. Second, there is science to develop regulatory processes. Illustrative examples include adaptive clinical trial designs; or tools to analyze the benefit-risk decision-making process of regulators; or novel pharmacovigilance methodologies. Included in such science would be initiatives to standardize regulatory processes (e.g., definitions of terms for adverse events [AEs] following immunization). The aim of a global regulatory science agenda is to transform current national efforts, mainly by well-resourced regulatory agencies, into a coordinated action plan to support global immunization goals. This article provides examples of how regulatory science has, in the past, contributed to improved access to vaccines, and identifies gaps that could be addressed through a global regulatory science agenda. The article also identifies challenges to implementing a regulatory science agenda and proposes strategies and actions to fill these gaps. A global regulatory science agenda will enable regulators, academics, and other stakeholders to converge around transformative actions for innovation in the regulatory process to support global immunization goals.Vaccine 04/2013; 31:B163–B175. DOI:10.1016/j.vaccine.2012.10.117 · 3.49 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: We present and test the intuition that letters to the editor in journals carry early signals of adverse drug events (ADEs). Surprisingly these letters have not yet been exploited for automatic ADE detection unlike for example, clinical records and PubMed. Part of the challenge is that it is not easy to access the full-text of letters (for the most part these do not appear in PubMed). Also letters are likely underrated in comparison with full articles. Besides demonstrating that this intuition holds we contribute techniques for post market drug surveillance. Specifically, we test an automatic approach for ADE detection from letters using off-the-shelf machine learning tools. We also involve natural language processing for feature definitions. Overall we achieve high accuracy in our experiments and our method also works well on a second new test set. Our results encourage us to further pursue this line of research.AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2012; 2012:1030-9.