Article

Desarrollo de un analizador sintáctico estadístico basado en dependencias para el euskera

Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 39, 2007, pags. 5-12 01/2007;
Source: OAI

ABSTRACT

Este artículo presenta los primeros pasos dados para la obtención de un analizador sintáctico estadístico para el euskera. El sistema se basa en un treebank anotado sintácticamente mediante dependencias y la adaptación del analizador sintáctico determinista de Nivre et al. (2007), que mediante un análisis por desplazamiento/reducción y un sistema basado en aprendizaje automático para determinar cuál de 4 opciones debe realizar, obtiene un único análisis sintáctico de la oración. Los resultados obtenidos se encuentran cerca de los obtenidos por sistemas similares. This paper presents the first steps towards a statistical syntactic analyzer for Basque. The system is based on a syntactically dependency annotated treebank and an adaptation of the deterministic syntactic analyzer of Nivre et al. (2007), which relies on a shift/reduce deterministic analyzer together with a machine learning module that determines which one of 4 analysis options to take, giving a unique syntactic dependency analysis of an input sentence. The results are near to those obtained by similar systems. Este trabajo está subvencionado por el Departamento de Industria y Cultura del Gobierno Vasco (proyecto AnHITZ 2006, IE06-185).

Download full-text

Full-text

Available from: Koldo Gojenola, Feb 03, 2015
  • Source
    • "was built on EPEC, a corpus that contains 300,000 words of standard written texts which is intended to be a training corpus for the development and improvement of several NLP tools (Bengoetxea and Gojenola, 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the work that has been carried out to annotate semantic roles in the Basque Depend ency Treebank (BDT) (Aldezabal et al., 2009). In this paper we will pre sent the resources we have used and the way the ann otation of 100 verbs has been done. We have followed the model proposed in the PropBank project (Palmer et al., 2005). In addition, we have adapted AbarHitz (Díaz de Ilarraza et al., 2004), a tool used in the construction of the Basque Dependency Treebank (BDT), for the task of annotating semantic roles.
    Full-text · Conference Paper · Jan 2010
  • Source
    • "For our task we will use the Basque Dependency Treebank (BDT). The Basque Dependency Treebank was built on EPEC, a corpus that contains 300,000 words of standard written texts which is intended to be a training corpus for the development and improvement of several NLP tools (Bengoetxea and Gojenola, 2007). Around one third of this collection was obtained from the Statistical Corpus of 20th Century Basque (http://www.euskaracorpusa.net). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with theoretical problems found in the work that is being carried out for annotating semantic roles in the Basque Dependency Treebank (BDT). We will present the resources used and the way the annotation is being done. Following the model proposed in the PropBank project, we will show the problems found in the annotation process and decisions we have taken. The representation of the semantic tag has been established and detailed guidelines for the annotation process have been defined, although it is a task that needs continuous updating. Besides, we have adapted AbarHitz, a tool used in the construction of the BDT, to this task.
    Full-text · Conference Paper · Jan 2010
  • Source
    • "In this article we present the parsing grammar implemented for each of these three languages which, together with Euskera (Aranzabe, M. et al 2004; Bengoetxea, K. et al 2007), are those we are working on in the framework of the KNOW project. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic deep parsing is necessary for any NLP applications requiring a certain level of semantic representation. One of the goals of the KNOW project is the development of wide-coverage deep parsing grammars whose outcome will be open to the scientific community. In this article we present a implementation of Spanish, Catalan and English grammars in the FreeLing environment. These three languages, together with Basque, are those we work on in KNOW. En el marco del área del PLN, obtener análisis sintácticos profundos de manera automática es indispensable de cara a desarrollar aplicaciones que puedan hacer uso de representaciones semánticas de cualquier nivel. Uno de los objetivos del proyecto KNOW es poner a disposición de la comunidad científica gramáticas de segmentación profunda de amplia cobertura. En este artículo presentamos la implementación en el entorno FreeLing de las gramáticas del castellano, catalán e inglés, lenguas que, junto con el vasco, constituyen las lenguas objeto de interés del proyecto KNOW.
    Full-text · Article ·
Show more