Desarrollo de un analizador sintáctico estadístico basado en dependencias para el euskera

Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 39, 2007, pags. 5-12 01/2007;
Source: OAI

ABSTRACT Este artículo presenta los primeros pasos dados para la obtención de un analizador sintáctico estadístico para el euskera. El sistema se basa en un treebank anotado sintácticamente mediante dependencias y la adaptación del analizador sintáctico determinista de Nivre et al. (2007), que mediante un análisis por desplazamiento/reducción y un sistema basado en aprendizaje automático para determinar cuál de 4 opciones debe realizar, obtiene un único análisis sintáctico de la oración. Los resultados obtenidos se encuentran cerca de los obtenidos por sistemas similares. This paper presents the first steps towards a statistical syntactic analyzer for Basque. The system is based on a syntactically dependency annotated treebank and an adaptation of the deterministic syntactic analyzer of Nivre et al. (2007), which relies on a shift/reduce deterministic analyzer together with a machine learning module that determines which one of 4 analysis options to take, giving a unique syntactic dependency analysis of an input sentence. The results are near to those obtained by similar systems. Este trabajo está subvencionado por el Departamento de Industria y Cultura del Gobierno Vasco (proyecto AnHITZ 2006, IE06-185).

  • [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntactic annotation process. In this article we present the quantitative and qualitative results of this evaluation.
    Computational Linguistics and Intelligent Text Processing, 10th International Conference, CICLing 2009, Mexico City, Mexico, March 1-7, 2009. Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with theoretical problems found in the work that is being carried out for annotating semantic roles in the Basque Dependency Treebank (BDT). We will present the resources used and the way the annotation is being done. Following the model proposed in the PropBank project, we will show the problems found in the annotation process and decisions we have taken. The representation of the semantic tag has been established and detailed guidelines for the annotation process have been defined, although it is a task that needs continuous updating. Besides, we have adapted AbarHitz, a tool used in the construction of the BDT, to this task.
    Computational Linguistics and Intelligent Text Processing, 11th International Conference, CICLing 2010, Iasi, Romania, March 21-27, 2010. Proceedings; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This project aims to add meaning, knowledge and reasoning to current interface tech- nologies. Specically, KNOW is providing novel natural language interpretation and rea- soning capabilities to current multilingual computer applications: full syntactic parsers and semantic interpreters (including word sense disambiguation systems and semantic role labelers) for the languages involved in the project, a common conceptual structure (the Multilingual Central Repository or MCR), and both automatic reasoning and analogy- based inference based on the MCR. KNOW has opened the way for a new generation of broad-coverage unrestricted-domain concept-based language understanding applications. KNOW has demonstrated the feasibility of these technologies in two applications linked to the project EPOs: Cross Lingual Information Retrieval and Question/Answering on two multi-modal databases. Consult the project website (90) for further information.

Full-text (4 Sources)

Available from
Feb 3, 2015