Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection.
ABSTRACT The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload.
We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N(pos)=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations.
Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed.
Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively).
Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.
- SourceAvailable from: Tara HarrisEurosurveillance: bulletin europeen sur les maladies transmissibles = European communicable disease bulletin 09/2014; 19(36). · 4.66 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Text mining is the computational process of extracting meaningful information from large amounts of unstructured text. It is emerging as a tool to leverage underutilized data sources that can improve pharmacovigilance, including the objective of adverse drug event (ADE) detection and assessment. This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining, and discusses several data sources-such as biomedical literature, clinical narratives, product labeling, social media, and Web search logs-that are amenable to text mining for pharmacovigilance. Given the state of the art, it appears text mining can be applied to extract useful ADE-related information from multiple textual sources. Nonetheless, further research is required to address remaining technical challenges associated with the text mining methodologies, and to conclusively determine the relative contribution of each textual source to improving pharmacovigilance.Drug Safety 08/2014; · 2.62 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Automated syndrome classification aims to aid near real-time syndromic surveillance to serve as an early warning system for disease outbreaks, using Emergency Department (ED) data. We present a system that improves the automatic classification of an ED record with triage note into one or more syndrome categories using the vector space model coupled with a 'learning' module that employs a pseudo-relevance feedback mechanism.Online journal of public health informatics. 10/2014; 6(2):e178.