A System for Recognition of Named Entities in Greek.
ABSTRACT In this paper, we describe work in progress for the development of a Greek named entity recognizer. The system aims at information
extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy
have been the basic guidelines for the system’s design. Pattern matching techniques have been implemented on top of an existing
automated pipeline for Greek text processing and the resulting system depends on non-recursive regular expressions in order
to capture different types of named entities. For development and testing purposes, we collected a corpus of financial texts
from several web sources and manually annotated part of it. Overall precision and recall are 86% and 81% respectively.
- SourceAvailable from: ocean.kisti.re.kr[Show abstract] [Hide abstract]
ABSTRACT: Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.Journal of Information Management. 04/2010; 41(2).
- [Show abstract] [Hide abstract]
ABSTRACT: Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
- [Show abstract] [Hide abstract]
ABSTRACT: We describe our work on Greek Named Entity Recognition using comparatively three different machine learning techniques: (i) Support Vector Machines (SVM), (ii) Maximum Entropy and (iii) Onetime, a shortcut method based on previous work of one of the authors. The majority of our system's features use linguistic knowledge provided by: morphology, punctuation, position of the lexical units within a sentence and within a text, electronic dictionaries, and the outputs of external tools (a tokenizer, a sentence splitter, and a Hellenic version of Brill's Part of Speech Tagger). After testing we observed that the application of a few simple Post Testing Classification Correction (PTCC) rules created after the observation of output errors, improved the results of the SVM and the Maximum Entropy systems output. We achieved very good results with the three methods. Our best configurations (Support Vector Machines with a second degree polynomial kernel and Maximum Entropy) achieved both after the application of PTCC rules an overall F-measure of 91.06.