Conference Paper

A System for Recognition of Named Entities in Greek.

DOI: 10.1007/3-540-45154-4_39 Conference: Natural Language Processing - NLP 2000, Second International Conference, Patras, Greece, June 2-4, 2000, Proceedings
Source: DBLP


In this paper, we describe work in progress for the development of a Greek named entity recognizer. The system aims at information
extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy
have been the basic guidelines for the system’s design. Pattern matching techniques have been implemented on top of an existing
automated pipeline for Greek text processing and the resulting system depends on non-recursive regular expressions in order
to capture different types of named entities. For development and testing purposes, we collected a corpus of financial texts
from several web sources and manually annotated part of it. Overall precision and recall are 86% and 81% respectively.

16 Reads
  • Source
    • "Petasis et al. (2002) use the C4.5 machine learning algorithm to update NER grammars. Boutsis et al. (2000) use a collection of 110 hand-crafted grammars. Lucarelli (2005) uses Support Vector Machines to recognize person Named Entities and semi-automatically created patterns to recognize temporal expressions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe our work on Greek Named Entity Recognition using comparatively three different machine learning techniques: (i) Support Vector Machines (SVM), (ii) Maximum Entropy and (iii) Onetime, a shortcut method based on previous work of one of the authors. The majority of our system's features use linguistic knowledge provided by: morphology, punctuation, position of the lexical units within a sentence and within a text, electronic dictionaries, and the outputs of external tools (a tokenizer, a sentence splitter, and a Hellenic version of Brill's Part of Speech Tagger). After testing we observed that the application of a few simple Post Testing Classification Correction (PTCC) rules created after the observation of output errors, improved the results of the SVM and the Maximum Entropy systems output. We achieved very good results with the three methods. Our best configurations (Support Vector Machines with a second degree polynomial kernel and Maximum Entropy) achieved both after the application of PTCC rules an overall F-measure of 91.06.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a named-entity recognizer for Greek person names and temporal expressions. For temporal expressions, it relies on semi- automatically produced patterns. For person names, it employs two Support Vector Machines, that scan the input text in two passes, and active learning, which reduces the human annotation effort during training.
    12/2006: pages 203-213;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The term “Named Entity”, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called “Named Entity Recognition and Classification (NERC)”. Le terme « entité nommée », maintenant largement utilisé dans le cadre du traitement des langues naturelles, a été adopté pour la Sixth Message Understanding Conference (MUC 6) (R. Grishman et Sundheim, 1996). À cette époque, la Conférence était concentrée sur les tâches d'extraction d'information (EI), dans lesquelles l'information structurée relative aux activités des entreprises et aux activités liées à la défense sont extraites de texte non structuré, comme les articles de journaux. Au moment de définir cette tâche, on a remarqué qu'il est essentiel de reconnaître les unités d'information comme les noms (dont les noms de personnes, d'organisations et de lieux géographiques) et les expressions numériques, notamment l'expression de l'heure, de la date, des sommes monétaires et des pourcentages. On a alors conclu que l'identification des références à ces entités dans le texte était une des principales sous-tâches de l'EI et on a alors nommé cette tâche Named Entity Recognition and Classification (NERC) (reconnaissance et classification d'entités nommées).
    Lingvisticæ Investigationes 01/2007; 30(1). DOI:10.1075/li.30.1.03nad
Show more