A System for Recognition of Named Entities in Greek.
ABSTRACT In this paper, we describe work in progress for the development of a Greek named entity recognizer. The system aims at information
extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy
have been the basic guidelines for the system’s design. Pattern matching techniques have been implemented on top of an existing
automated pipeline for Greek text processing and the resulting system depends on non-recursive regular expressions in order
to capture different types of named entities. For development and testing purposes, we collected a corpus of financial texts
from several web sources and manually annotated part of it. Overall precision and recall are 86% and 81% respectively.
- [Show abstract] [Hide abstract]
ABSTRACT: Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language.Journal of Information Science 04/2011; 37(2):137-151. DOI:10.1177/0165551511398573 · 1.09 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: We present a freely available named-entity recognizer for Greek texts that identifies temporal expressions, person, and organization names. For temporal expressions, it relies on semi-automatically produced patterns. For person and organization names, it employs an ensemble of Support Vector Machines that scan the input text in two passes. The ensemble is trained using active learning, whereby the system itself proposes candidate training instances to be annotated by a human during training. The recognizer was evaluated on both a general collection of newspaper articles and a more focussed, in terms of topics, collection of financial articles.International Journal of Artificial Intelligence Tools 12/2007; 16:1015-1045. DOI:10.1142/S0218213007003680 · 0.32 Impact Factor
Conference Paper: A Named Entity Recognition approach for Albanian[Show abstract] [Hide abstract]
ABSTRACT: Named Entity Recognition (NER) deals with identifying personal, geographical, organizational or other entity types in a raw text. In this paper we propose the first NER model for the Albanian language. Our model is based on the maximum entropy approach. We manually annotate a corpus in the historical and political domains and train the models to generate classifiers that are able to recognize relevant entities in the text. We achieve good performance for precision and recall on the selected domains, despite the scarcity of Albanian corpora and the fact that this paper marks the first NER research for the Albanian language. Experiments demonstrate that the models can be further improved if richer training corpus is provided.Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on; 01/2013