Conference Paper

The Development of a Schema for the Annotation of Terms in the Biocaster Disease Detecting/Tracking System.

Conference: KR-MED 2006, Formal Biomedical Knowledge Representation, Proceedings of the Second International Workshop on Formal Biomedical Knowledge Representation: "Biomedical Ontology in Action" (KR-MED 2006), Collocated with the 4th International Conference on Formal Ontology in Information Systems (FOIS-2006), Baltimore, Maryland, USA, November 8, 2006
Source: DBLP
Download full-text


Available from: Roberto A Barrero, Aug 07, 2015
  • Source
    • "F-score improved from 76.96% to 79.96% for all classes. In particular , Kawazoe et al. observed a large improvement in PERSON Fscore from 59.95% to 66.28% and NONHUMAN from 68.0% to 73.21% [16] "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naïve Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline "raw text" representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task.
    Journal of Biomedical Informatics 10/2009; 42(5):773-80. DOI:10.1016/j.jbi.2008.12.009 · 2.48 Impact Factor
  • Source
    • "<NAME cl="ORGANIZATION">WHO</NAME> is seeking confirmation and further information from the <NAME cl="ORGANIZATION">Ministry of Health</NAME>. </DOC> Figure 1: Example Annotated Entry from the BioCaster Corpus a truncated example, and Kawazoe et al. (2006) for a description of the annotation scheme). The corpus consists of around 290,000 words (excluding annotation). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper explores the benefits of using n-grams and semantic features for the classification of disease outbreak reports, in the context of the BioCaster disease outbreak report text mining system. A novel feature of this work is the use of a general purpose semantic tagger - the USAS tagger - to generate features. We outline the application context for this work (the BioCaster epidemiological text mining system), before going on to describe the experimental data used in our classification experiments (the 1000 document BioCaster corpus). FEATURE SETS: Three broad groups of features are used in this work: Named Entity based features, n-gram features, and features derived from the USAS semantic tagger. Three standard machine learning algorithms - Naïve Bayes, the Support Vector Machine algorithm, and the C4.5 decision tree algorithm - were used for classifying experimental data (that is, the BioCaster corpus). Feature selection was performed using the chi(2) feature selection algorithm. Standard text classification performance metrics - Accuracy, Precision, Recall, Specificity and F-score - are reported. A feature representation composed of unigrams, bigrams, trigrams and features derived from a semantic tagger, in conjunction with the Naïve Bayes algorithm and feature selection yielded the highest classification accuracy (and F-score). This result was statistically significant compared to a baseline unigram representation and to previous work on the same task. However, it was feature selection rather than semantic tagging that contributed most to the improved performance. This study has shown that for the classification of disease outbreak reports, a combination of bag-of-words, n-grams and semantic features, in conjunction with feature selection, increases classification accuracy at a statistically significant level compared to previous work in this domain.
    International Journal of Medical Informatics 06/2009; 78(12):e47-58. DOI:10.1016/j.ijmedinf.2009.03.010 · 2.72 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract We present the Global Health Monitor, an online,Web-based,system,for detecting and mapping,infectious disease outbreaks,that,appear,in,news stories. The system analyzes English news stories from news feed providers, classifies them,for topical relevance and plots them,onto,a Google,map,using geo-coding information, helping public health,workers,to,monitor,the spread of diseases in a geo-temporal,context. The background,knowledge,for,the system is contained,in the BioCaster ontol- ogy (BCO) (Collier et al., 2007a) which includes both information on infec- tious,diseases,as,well,as geographical,locations,with,their lati- tudes/longitudes. The,system,consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and,visualiza- tion. Evaluation of,the system,shows that it achieved,high accuracy on a gold standard,corpus. The,system,is now
Show more