This paper presents a study about methods for normalization of historical texts. The aim of these methods is learning relations between historical and contemporary word forms. We have compiled training and test corpora for different languages and scenarios, and we have tried to read the results related to the features of the corpora and languages....
Full-text available
Hizkuntzaren Prozesamenduan kokatzen den Dependentzia Unibertsalen proiektuaren helburua da hainbat hizkuntzatan sortu diren dependentzia-ereduan oinarritutako zuhaitz-bankuak etiketatze-eskema estandar berera egokitzea. Artikulu honetan, eredu horretara automatikoki egokitu den euskarazko zuhaitz-bankua aurkezten da. Egokitzapen-lan hori nola gauz...
Euskal Herrian leku batetik bestera eta garai batetik bestera modu, forma edota egitura desberdinak erabili izan ditugu euskaraz gauza berdinak adierazteko. Adibidez, gaur egun, Euskal Herriko leku batzuetan Jon artzaina da esaten da; beste batzuetan, Jean artzain da (garai batean, artikulurik gabeko forma hori hedatuagoa zen gaur egun baino). XVI....
Full-text available
Resumen: El tamaño reducido de los corpus en ciertos campos de investigación se debe a la falta de herramientas para procesar el lenguage de forma masiva y sencilla. En este artículo presentamos ANALHITZA, una herramienta que esta-mos desarrollando dentro del proyecto Clarin-k que tiene como objetivo principal la creación de tecnologías lingüística...
Full-text available
Resumen: El tamaño reducido de los corpus en ciertos campos de investigación se debe a la falta de herramientas para procesar el lenguage de forma masiva y sencilla. En este artículo presentamos ANALHITZA, una herramienta que esta-mos desarrollando dentro del proyecto Clarin-k que tiene como objetivo principal la creación de tecnologías lingüística...
User acceptance of artificial intelligence agents might depend on their ability to explain their reasoning to the users. We focus on a specific text processing task, the Semantic Textual Similarity task (STS), where systems need to measure the degree of semantic equivalence between two sentences. We propose to add an interpretability layer (iSTS fo...
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Ma...
This paper explores three different methods of learning to map variant word form (dialectal or diachronic) to standard ones from a limited parallel corpus of standard and variant texts, given that a computational description of the standard morphology is available. © 2014 Sociedad Esola para el Procesamiento del Lenguaje Natural.
Full-text available
This article presents an environment developed for Learner Corpus Research and Error Analysis which makes it possible to deal with language errors from different points of view and with several aims. In the field of Intelligent Computer Assisted Language Learning (ICALL), our objective is to gain a better understanding of the language learning proc...
Conference Paper
In this paper we describe some experiments based on a previous Constraint Grammar (CG-2) of Basque Complex Postpositions. We present the development and the evaluation of the rewriten CG-2 and the new CG-3 grammars for processing Basque Complex Postpositions.
Full-text available
Artikulu honetan BASYQUE aplikazioa sortzeko jarraitu den metodologia aurkezten da, metodologia horren inguruko hainbat gogoetarekin batera. Aplikazio hau Iparraldeko hizkeren baitako bariazio sintaktikoa biltzera mugatzen den arren, baliatzen diren metodologia eta bitartekoak erabilgarriak dira beste hizkera batzuen inguruan ere antzeko aplikazioa...
En este artículo presentamos BASYQUE, la aplicación que hemos desarrollado para el estudio de la variación sintáctica. Aunque este proyecto se centra fundamentalmente en los dialectos del País Vasco francés (Iparralde), BASYQUE es una aplicación multilingüe útil también para el análisis de otras lenguas y/o dialectos. Además de presentar las opcion...
The coexistence of five languages with offcial status in the Iberian Peninsula (Basque, Catalan, Galician, Portuguese, and Spanish), has prompted collaborative efforts to share and cross-develop resources and materials for these languages of the region. However, it is not the case that comprehension boundaries only exist between each of these five...
Conference Paper
Full-text available
This paper deals with theoretical problems found in the work that is being carried out for annotating semantic roles in the Basque Dependency Treebank (BDT). We will present the resources used and the way the annotation is being done. Following the model proposed in the PropBank project, we will show the problems found in the annotation process and...
Conference Paper
The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntact...
Full-text available
In this article, we shall comment on the steps that have to be taken to give a linguistic label to a corpus and the difficulties that appear in this process. Our main objective was to highlight the importance of the labelling when preparing a corpus that is useful for linguistic research, and the need to establish criteria and to take the decisions...
Conference Paper
Full-text available
Knowledge construction is expensive for Computer Assisted As- sessment. When setting exercise questions, teachers use Test Makers to con- struct Question Banks. The addition of Automatic Generation to assessment applications decreases the time spent on constructing examination papers. In this article, we present ArikIturri, an Automatic Question Ge...
Full-text available
In this article, we present a Computer Assisted Language Learning (CALL) environment for Basque. The environment has different aims: on the one hand, to offer the users (teachers, learners and computational linguists) different tools and language resources to clarify the linguistic doubts they might have about the language, and on the other hand, t...
Full-text available
IRAKAZI is a teacher oriented web-based system designed for the study of Basque learners' learning process. The system involves a wide background in NLP tools, error detection and ICALL environments and it is easily transferable to other languages.
Conference Paper
Full-text available
This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Cons...
Full-text available
In this work we present a dynamic classification defined to store and classify errors. The collected information will be the starting point for the study of Basque language learning process and for research on different fields such as Error Analysis (EA) and Natural Language Processing (NLP). This error classification is integrated in some NLP tool...
Full-text available
This paper describes the methodology we have adopted to ensure the quality of the Basque WordNet in terms of coverage, correctness, completeness and adequacy. The Basque WordNet follows the EuroWordNet framework and, basically, it is produced using a semi-automatic method that links Basque words to the English WordNet. We have found that in order t...
Full-text available
This paper presents systems for syntactic chunking and clause identification for Basque, combining rule-based grammars with machine-learning techniques. Precisely, we used Filtering-Ranking with Perceptrons (Carreras, Màrquez and Castro, 2005): a learning model that recognizes partial syntactic structures in sentences, obtaining state-of-the-art pe...
Ordenagailu bidezko hizkuntzen irakaste-sistematan, normalean ariketa itxiak aurkezten zaizkie ikasleei. Esaterako, lau aukeren artean zuzena aukeratzea. Mota honetako ariketetan ordenagailuak aldez aurretik gordeta dauka emaitza zuzena. Ariketa irekia denean, aldiz, ikasleak nahi duena idazteko aukera dauka. Kasu honetan ikaslearen emaitza ez dago...


