Mikel Iruskieta

Mikel Iruskieta
Universidad del País Vasco / Euskal Herriko Unibertsitatea | UPV/EHU · IXA Group

PhD

About

69
Publications
9,269
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
321
Citations
Introduction
Interested in: Basque language parsing, discourse analysis, NLP (MT, AS and sentiment analysis), didactics of language, language infrastructures, CLARIN, DARIAH, PARTHENOS, Member of: - CLARIN-ERIC KSI committee Organising: - Discourse Relation Parsing and Treebanking (DISRPT), 2019 - Workshops "RST and Discourse Studies". 2013, 2015, 2017 - The 25th edition of the Annual Conference of the SEPLN. 2009 - Gender Equality conference. 2009 - Third International Workshop on Semantics, Pragmatics and Rhetoric. 2005 Scientific Committee: - III Workshop "A RST e os Estudos do Texto". 2011 PROYECTS: - NewsReader: European Commission - Ber2Tek. Basque Govern
Additional affiliations
October 2008 - present
University of the Basque Country
Position
  • Lecturer
June 2004 - present
University of the Basque Country
Position
  • Researcher
Description
  • IXA is a research group working on Natural Language Processing since 1988. It was created with the aim of developing basic computational resources for Basque.

Publications

Publications (69)
Article
Detecting and resolving writing difficulties is of the utmost importance for student success, mostly because school assessment systems are based on written production tasks. Students writing difficulties come from a variety of sources: neurodevelopmental and language source, among others. The aim of this article is to propose a checklist to identif...
Article
Ingurune digitalean bizi arren, bi arazo ezberdin izan ditzakegu: bat, teknologiaren eraldaketa azkarra edota egokitu gabekoa izatea eta, bi, euskara eta antzeko baliabide mugatuetako hiz- kuntzetan behar diren teknologiak ez sortzea edota erabilgarri ez egotea.Artikulu honetan, euskara ikasteko, irakasteko eta ikertzeko prozesuetan teknologiak tes...
Preprint
On-line unibertsitate horren eraikuntza ondo mamitutako oinarri psikopedagogikoetatik eginbeharra dago, ingurune digitalean ere era guztietako ikaskuntza (formala, ez-formala etainformala) eta ikerketa-prozesuak bultzatuko eta mesedetuko dituen egitasmo baten baitan,eta beti ere etengabeko planifikazioa, ekintza eta ebaluazioa gidari dituela. Jakin...
Article
Full-text available
Lan honetan Martin Txiki eta Basajaunak ipuinaren irakurketa ozen esanguratsuak talde zehatz bateko haurren hizkuntza- eta komunikazio-gaitasunean izan duen eragin zuzena aztertu da, beren beregi diseinatutako esku-hartze eta ikerketa baten bidez. Zehatzago, ipuina bizkaierara mol- datu da eta Mungiako Legarda HLHI ikastetxeko Haur Hezkuntzako 4 ur...
Article
Full-text available
Lately, discourse structure has received considerable attention due to the benefits its application offers in several NLP tasks such as opinion mining, summarization, question answering , text simplification, among others. When automatically analyzing texts, discourse parsers typically perform two different tasks: i) identification of basic discour...
Conference Paper
Full-text available
In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection. In this paper we review the data included in the task, which cover 2.6 million manually annotated tokens from 15 datasets...
Conference Paper
Full-text available
This overview summarizes the main contributions of the accepted papers at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019). Co-located with NAACL 2019 in Minneapolis, the workshop's aim was to bring together researchers working on corpus-based and computational approaches to discourse relations. In addition to an invite...
Article
Full-text available
The creation of a semantic oriented lexicon of positive and negative words is often the first step to analyze the sentiment of a corpus. Various methods can be employed to create a lexicon: supervised and unsupervised. Until now, methods employed to create Basque polarity lexicons were unsupervised. The aim of this paper is to present the construct...
Conference Paper
Full-text available
In this work, we have analysed the effects of negation on the semantic orientation in Basque. The analysis shows that negation markers can strengthen, weaken or have no effect on sentiment orientation of a word or a group of words. Using the Constraint Grammar formalism, we have designed and evaluated a set of linguistic rules to formalize these th...
Conference Paper
Full-text available
Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical fr...
Conference Paper
Full-text available
Corpus-based discourse analysis of Chinese, as the most spoken language in the world, could be useful for language learning and translation studies. We present here the development of the first free open access Chinese discourse segmented corpus following RST, which can help in the evaluation of automatic segmentation systems and in the development...
Conference Paper
Deliberation is an increasingly used concept in Argumentation Theory and Linguistic Analysis. But only recently research combined empirical and conceptual tool-boxes from these disciplines for the study of deliberative discourse. The aim of this article is to present a discursive analysis of deliberation as a genre using the relational discourse st...
Conference Paper
Full-text available
Deliberation is an increasingly used concept in Argumentation Theory and Linguistic Analysis. But only recently research combined empirical and conceptual tool-boxes from these disciplines for the study of deliberative discourse. The aim of this article is to present a discursive analysis of deliberation as a genre using the relational discourse st...
Conference Paper
Full-text available
Systems for opinion and sentiment analysis rely on different resources: a lexicon, annotated corpora and constraints (morpholog-ical, syntactic or discursive), depending on the nature of the language or text type. In this respect, Basque is a language with fewer linguistic resources and tools than other languages , like English or Spanish. The aim...
Presentation
Full-text available
Se descriven algunas cuestiones de Clarin-K-Centre español visto como una una infraestructura orientada al usuario: entorno virtual de investigación y herramientas de procesador de textos multilingüe
Article
Las alumnas leen textos en la escuela, después hacen ejercicios de comprensión con esos textos, pero muchas veces no tienen en cuenta la estructura relacional del discurso. En este articulo hemos analizado como comprenden las alumnas de Primaria un texto mediante resúmenes hechos por ellas mismas. Para ello, hemos creado un corpus con resúmenes de...
Article
Full-text available
Resumen: El tamaño reducido de los corpus en ciertos campos de investigación se debe a la falta de herramientas para procesar el lenguage de forma masiva y sencilla. En este artículo presentamos ANALHITZA, una herramienta que esta-mos desarrollando dentro del proyecto Clarin-k que tiene como objetivo principal la creación de tecnologías lingüística...
Article
This paper presents an automatic detector of the discourse central unit (CU) in scientific abstracts based on machine learning techniques. After segmenting a text in its elementary discourse units, the detection of the central unit is a crucial step on the way to robustly build discourse trees under the Rhetorical Structure Theory (RST). Besides, C...
Article
Full-text available
Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translate systems exist for this pair of languages, but there is enough room to improve the translation qual...
Article
Full-text available
Resumen: El tamaño reducido de los corpus en ciertos campos de investigación se debe a la falta de herramientas para procesar el lenguage de forma masiva y sencilla. En este artículo presentamos ANALHITZA, una herramienta que esta-mos desarrollando dentro del proyecto Clarin-k que tiene como objetivo principal la creación de tecnologías lingüística...
Article
Full-text available
Un detector de la unidad central de un texto basado en técnicas de aprendizaje automático en textos científicos para el euskera * Resumen: En este artículo presentamos el primer detector de la Unidad Central (UC) de resúmenes científicos en euskera basado en técnicas de aprendizaje automático. Después de segmentar el texto en unidades de discurso e...
Conference Paper
Full-text available
Due to the huge population that speaks Spanish and Chinese, these languages occupy an important position in the language learning studies. Although there are some automatic translation systems that benefit the learning of both languages, there is enough space to create resources in order to help language learners. As a quick and effective resource...
Conference Paper
Full-text available
Resumen: Presentamos CLARIN Centro-K-español que forma parte de la infraestructura europea CLARIN, Common Language Resources and Technology Infrastructure, y cuyo objetivo es ofrecer los conocimientos y experiencia de los tres grupos que inicialmente lo componen en la utilización de tecnología para la investigación en humanidades y ciencias sociale...
Article
We introduce Spanish CLARIN Centre-K, a node of the European infrastructure CLARIN, Common Language Resources and Technology, whose objective is to share knowledge and experience of the three funding constituent groups for research in humanities and social sciences. © 2016 Sociedad Española para el Procesamiento del Lenguaje Natural.
Article
Full-text available
Nowadays, opinion texts play an important role, in fact, people read opinions before they do an activity, buy a product or take a decision. However, the amount of opinion text is increasing rapidly and reading all opinions about a subject is unfeasible. ‘Sentiment analysis’ is a part of Natural Language Processing whose aim is to process opinion te...
Article
Full-text available
Detecting automatically the cause relations of a text may be useful in question answering tasks and event information extraction. The aim of this paper is to study how to detect coherence relations of the cause subgroup (CAUSE, RESULT and PURPOSE). To achieve this aim we have used the Rhetorical Structure Theory (RST) and some automatic linguistic...
Conference Paper
Full-text available
In the RST framework, there are several discourse-annotated corpora available in different languages, such as: English, Spanish, Brazilian Portuguese, German and Basque, among others. Some of them can be consulted and several tools have been developed for corpus exploration. There is also a small multilingual aligned RST corpus, which can be explo...
Article
Full-text available
The aim of this paper is to present the development of a rule-based automatic detector which determines the main idea or the most pertinent discourse unit in two different languages such as Basque and Brazilian Portuguese and in two distinct genres such as scientific abstracts and argumentative answers. The central unit (CU) may be of interest to u...
Article
Full-text available
Komunikatzeko konpetentzia gero eta gehiago lantzen den arloa da diziplina askotan. Esatariak entzuleei intentzioak komunikatu nahi badizkio eta horiengan efektuak lortu nahi badira, balio pragmatikoan oinarrituriko estrategia batzuk ikasi eta praktikatzea komeni da. Estrategia horiek praktikatzen hasteko, irakurketa ozena ariketa paregabea da, has...
Conference Paper
Full-text available
This paper presents a study in sentiment analysis which exploits information of the relational discourse structure in a Basque corpus consisting of literature reviews. The QWN-PPV method was employed to label all the texts at element level and the Rhetorical Structure Theory (RST) was used to extract discourse structure information. The preliminary...
Conference Paper
Full-text available
This paper presents an automatic rule-based detector of the most salient discourse units in scientific abstracts. After segmentation, the detection of the central unit is a crucial annotation phase in the Rhetorical Structure Theory (RST), which could be exploited in automatic summarization or question answering tasks. Although there is still room...
Article
Full-text available
We present the first discursive segmenter for Basque implemented by heuristics based on syntactic dependencies and linguistic rules. Preliminary experiments show F 1 values of more than 85% in automatic EDU segmentation for Basque.
Chapter
Full-text available
Laburpena Lan honetan Hizkuntzalaritza Konputazionalaren diziplinan eta erlaziozko diskurtso-egituraren aztergaian koherentzia aztertzeko gehien erabiltzen den Egitura Erretorikoaren Teoria (Rhetorical Structure Theory edo RST) aurkeztuko dugu eta baita teoria horri esker euskaraz deskribatu den Eus-kal RST Treebanka ere. Corpusa lau hizkuntzalarik...
Conference Paper
Full-text available
This article aims to analyze how agreement regarding the central unit (macrostructure) influ-ences agreement when establishing rhetorical relations (microstructure). To do so, the authors conducted an empirical study of abstracts from research articles in three domains (medicine, ter-minology, and science) in the framework of Rhetorical Structure T...
Conference Paper
Full-text available
Written human communications usually consist of more than one sentence, and the coherence relations that exist between these sentences cannot be explained in terms of a successive sequence of phrases (van Dijk 1997). Normally, coherent texts have a structure that is much more complex than mere juxtaposition, providing, of course, that the author wi...
Technical Report
Full-text available
Written human communications usually consist of more than one sentence, and the coherence relations that exist between these sentences cannot be explained in terms of a successive sequence of phrases (van Dijk 1997). Normally, coherent texts have a structure that is much more complex than mere juxtaposition, providing, of course, that the author wi...
Article
Full-text available
Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ i...
Article
Full-text available
This article presents a discourse annotation methodology based on Rhetorical Structure Theory and an empirical study of annotating a corpus of specialized medical texts in Basque. The annotation process includes two phases: segmentation and annotation of rhetorical relations. Phase one entails an initial study which leads to establishing linguistic...
Article
Full-text available
This paper introduces the first Basque discourse TreeBank annotated with rhetorical relations following Rhetorical Structure Theory. We report the main features of the corpus, such as the annotation criteria, inter-annotator agreement and harmonization procedure. We describe an online search system to check the annotation of discourse relations.
Conference Paper
In this paper we describe some experiments based on a previous Constraint Grammar (CG-2) of Basque Complex Postpositions. We present the development and the evaluation of the rewriten CG-2 and the new CG-3 grammars for processing Basque Complex Postpositions.
Chapter
Hiztunak, diskurtsoa antolatzeko dituen beste hizkuntza-baliabide batzuen artean, birformulazioa erabiltzen du bere diskurtsoan atzera egiteko eta beste ikuspegi batetik aurretik aipatutakoa interpretatzeko edota aurkezteko. Atzeraeraginezko birformulatze-prozesu horri esker, aurreko informazioa laburbil dezake (birformulazio laburbiltzailea); konk...
Conference Paper
Full-text available
In this paper we study how to adapt an automatic clause parser to discourse segmentation task. Considering a manually tagged corpus according to Rhetorical Structure Theory (RST), we have processed it with an automatic clause parser and the results were studied by comparing the agreement between both annotation systems: automatic and manual. As a r...
Article
Full-text available
This article describes the study on the features used for labelling the discourse structure, according to the Rhetorical Structure Theory, at the inter-sentential and intra-sentential levels. The tagged corpus is composed of medical texts written in Basque and extracted from the medical journal 'Gaceta Médica de Bilbao'. The difficulties encountere...
Article
En este artículo se describe el estudio realizado sobre las características del etiquetado de la estructura de discurso, según la Teoría de la Estructura Retórica, en los niveles inter-oracional e intra-oracional. El corpus etiquetado está compuesto por textos médicos escritos en euskera y extraídos de la Gaceta Médica de Bilbao siendo nuestro obje...
Conference Paper
Full-text available
Con esta comunicación pretendemos reflexionar en torno a una experiencia que hemos puesto en marcha este curso en la titulación de Educación Infantil de la E.U. de Magisterio de la UPV/EHU. En concreto, nos referimos a la implantación dentro de los nuevos planes EEES de la nueva titulación de Educación Infantil y dentro de la misma en su primer mód...
Article
Full-text available
This article describes the study on the features used for labelling the discourse structure, according to the Rhetorical Structure Theory, at the inter-sentential and intrasentential levels. The tagged corpus is composed of medical texts written in Basque and extracted from the medical journal 'Gaceta Médica de Bilbao'. The difficulties encountered...
Article
Full-text available
En este trabajo presentamos un estudio realizado con el fin de averiguar si las relaciones retóricas y las marcas superficiales que las evidencian tienen potencial para distinguir entre textos especializados de diferentes ámbitos que comparten un nivel de especialización alto, en dos lenguas tan diferentes como el euskera y el español. Para el anál...
Article
Full-text available
The study we report in this article addresses the results of comparing the rhetorical trees from two different languages carried out by two annotators starting from the Rhetorical Structure Theory (RST). Furthermore, we investigate the methodology for a suitable evaluation, both quantitative and qualitative, of these trees. Our corpus contains abst...
Chapter
Full-text available
El objetivo de este estudio es analizar las relaciones discursivas de la Rhetorical Structure Theory (RST) de Mann y Thompson (1988) empleadas en el discurso médico en español y euskera, y detectar los marcadores del discurso que las evidencian. Este análisis nos permite observar diferencias de comportamiento entre estas dos lenguas, tan diferentes...
Conference Paper
The aim of this work is to evaluate the dependency-based annotation of EPEC (the Reference Corpus for the Processing of Basque) by means of an experiment: two annotators have syntactically tagged a sample of the mentioned corpus in order to evaluate the agreement-rate between them and to identify those issues that have to be improved in the syntact...
Chapter
En este artículo presentamos el resultado de un estudio empírico realizado sobre textos en euskera etiquetados con relaciones retóricas. Se ha aplicado la Teoría de la Estructura Retórica en el etiquetado de 10 textos periodísticos (1.442 palabras) extraídos al azar del corpus EPEC (Corpus de referencia para el procesamiento del euskera). Del estud...
Chapter
Full-text available
The aim of this work is to attempt a unified classification of discourse markers in the Basque language. To get that classification it is necessary to emphasize the importance of speech over grammar. Thus the semantic relations will replace the coordinative conjunction role. The objective of this studyconcerns therefore the structural relations of...
Article
Full-text available
This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way though th...
Article
Full-text available
Natural Language Processing techniques need to develop lexical-semantic knowledge bases (LSKB) in order to perform semantic interpretation. The IXA group decided to develop a Basque called EuskalWordNet for this reason. EuskalWordNet is based on WordNet and its multilingual counterpart EuroWordNet. This paper reviews the theoretical and practical a...
Article
Full-text available
This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way through t...

Network

Cited By

Projects

Projects (4)
Archived project
Creation of the first open Spanish-Chinese parallel treebank under Rhetorical Structure Theory (RST) (Mann and Thompson, 1988) for Spanish-Chinese discourse analysis, especially for the translation and language learning tasks between the two languages.