
Eiríkur RögnvaldssonUniversity of Iceland | HI · Faculty of Icelandic and Comparative Cultural Studies
Eiríkur Rögnvaldsson
Cand.Mag.
About
117
Publications
24,420
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
910
Citations
Introduction
Additional affiliations
January 1986 - present
Publications
Publications (117)
In this paper we describe how a fairly new CLARIN member is building a broad collection of national language resources for use in language technology (LT). As a CLARIN C-centre,CLARIN-IS is hosting metadata for various text and speech corpora, lexical resources, software packages and models. The providers of the resources are universities, institut...
Inngangur gestaritstjóra.
In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programm...
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programm...
Markmið öndvegisverkefnisins Greining á málfræðilegum afleiðingum stafræns málsambýlis, sem lýst er í þessari grein, er að kanna stöðu íslensku á tímum mikilla samfélags- og tæknibreytinga, með sérstöku tilliti til hugsanlegra áhrifa ensku, einkum í gegnum stafræna miðla. Í fyrsta lagi er reynt að komast að því hversu mikið mállegt ílag eða málárei...
English is increasingly influencing the Icelandic language community, raising concerns about the state and prospects of the Icelandic language. Recent studies indicate that such concerns are probably justified. The viability of the language depends on it being used in all areas of daily communication and the attitudes of speakers toward the contact...
We present Risamálheild, the Icelandic Gigaword Corpus (IGC), a corpus containing more than one billion running words from mostly contemporary texts. The work was carried out with minimal amount of work and resources, focusing on material that is not protected by copyright and sources which could provide us with large chunks of text for each cleare...
The paper describes work in progress to compile an Icelandic Gigaword Corpus (IGC). The initial aim of the project was to compile a large corpus of contemporary texts with at least a billion running words, with the minimum amount of work and resources. Thus we focussed on material not protected by copyright and sources which could provide us with l...
This paper proposes that the digital domains of language use (DDLU) be included in future assessments of language vitality. DDLU, including the consumption of online content, engagement with social media and chat which now make an important, and rapidly growing, part of the daily language use in many speech communities. This is true even in communi...
This paper describes the Málrómur corpus, an open, manually verified, Icelandic speech corpus. The recordings were collected in 2011–2012 by Reykjavik University and the Icelandic Center for Language Technology in cooperation with Google. 152 hours of speech were recorded from 563 participants. The recordings were subsequently manually inspected by...
This paper describes work in progress. We experiment with training a state-of-the-art tagger, Stagger, on a new gold standard, MIM-GOLD, for the PoS tagging of Icelandic. We compare the results to results obtained using a previous gold standard, IFD. Using MIM-GOLD, tagging accuracy is considerably lower, 92.76% compared to 93.67% accuracy for IFD....
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative's work throughout Europe in order to boost progress a...
In Icelandic, a few verbs of sense and perception can take two types of complements; either finite that-clauses with a verb in the subjunctive as in (ia), or infinitival clauses (without the infinitival marker), as in (ib): (i) a. Mér sýnist [að hún sé rík]. me(dat.) seems [that she is(subjunctive) rich] ‘It seems to me that she is rich’ b. Mér sýn...
This paper presents ongoing work that aims to improve machine parsing of Faroese using a combination of Faroese and Icelandic training data. We show that even if we only have a relatively small parsed corpus 0f one language, namely 53,000 words of Faroese, we can obtain better results by adding information about phrase structure from a closely rela...
In this paper, we describe the correction of PoS tags in a new Icelandic corpus, MIM-GOLD, consisting of about 1 million tokens sampled from the Tagged Icelandic Corpus, MÍM, released in 2013. The goal is to use the corpus, among other things, as a new gold standard for training and testing PoS taggers. The construction of the corpus was first desc...
We describe the current status of Icelandic language technology with respect to available language resources and tools. The recent META-NET survey of the state of language technology support for 30 languages clearly demonstrated that Icelandic lags behind almost all European languages in this respect. However, it is encouraging that as a result of...
Eiríkur Rögnvaldsson Talmál og tilbrigði Skráning, mörkun og setningafraeðileg nýting talmálssafna 0. Inngangur Þessi kafli fjallar um nýtingu talmálsgagna til athugana á tilbrigðum í setningagerð, og hvernig þurfi að tilreiða gögnin þannig að haegt sé að vinna með þau og leita í þeim. Kaflinn skiptist í þrennt. Í fyrsta hluta er fjallað um þau tal...
This paper reports on some results of a large scale study of variation in Icelandic syntax, supported by the Icelandic Research Fund. It has been referred to as IceDiaSyn (Icelandic Dialect Syntax) and was associated with the Scandinvian networks ScanDiaSyn (Scandin - avian Dialect Syntax) and NORMS (Nordic Center of Excellence in Micro comparative...
The purpose of the Almannarómur project is collecting data for a speech corpus (database) for Icelandic. Its main aim is creating an open source speech project to enable research and development for Icelandic language technology. The database is particularly suitable for acoustic modelling for speech recognition but it could also be used for other...
As part of the META-NORD project, the state of affairs in language technology in the Nordic and Baltic countries is being described in a set of eight reports. Each language report describes the situation of a language community and the position of the language service and language technol-ogy industry for that language. This posi-tion paper present...
The META-NORD project has contributed to an open infrastructure for language resources (data and tools) under the META-NET umbrella. This paper presents the key objectives of META-NORD and reports on the results achieved in the first year of the project. META-NORD has mapped and described the national language technology landscape in the Nordic and...
This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual di...
In this paper, we describe the development of a morphosyntactically tagged corpus of Icelandic, the MÍM corpus. The corpus consists of about 25 million tokens of contemporary Icelandic texts collected from varied sources during the years 2006–2010. The corpus is intended for use in Language Technology projects and for linguistic research. We descri...
We describe the background for and building of IcePaHC, a one million word parsed historical corpus of Icelandic which has just been finished. This corpus which is completely free and open contains fragments of 60 texts ranging from the late 12 th century to the present. We describe the text selection and text collecting process and discuss the qua...
We present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain both written and spoken language, and in addition have a diachronic dimension. Since Icelandic is an example of what has been called a less-resourced language when it comes to computational linguistics...
We describe experiments with morphosyntactic tagging of Old Icelandic (Old Norse) narrative texts using different tagging
models for the TnT tagger [3] and a tagset of almost 700 tags, originally developed for Modern Icelandic. It is shown that
by using a model that has been trained on both Old and Modern Icelandic texts, we can get 92.7% tagging a...
We experiment with extending the dictionaries used by three open-source part of-speech taggers, by using data from a large Icelandic morphological database. We show that the accuracy of the taggers can be improved significantly by using the database. The reason is that the unknown word ratio reduces dramatically when adding data from the database t...
This paper introduces the META-NORD pro-ject which develops Nordic and Baltic part of the European open language resource infra-structure. META-NORD works on assem-bling, linking across languages, and making widely available the basic language resources used by developers, professionals and re-searchers to build specific products and ap-plications....
We present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain texts from various different periods. Since Icelandic is an example of what has been called a less-resourced language when it comes to computational linguistics and language technology, it is essential...
We describe the establishment and development of Icelandic language technology since its very beginning ten years ago. The ground was laid with a report from an Expert Group appointed by the Minister of Education, Science and Culture in 1998. In this report, which was delivered in the spring of 1999, the group proposed several actions to establish...
In this paper, we describe the development of a new tagged corpus of Icelandic, consisting of about 1 million tokens. The goal is to use the corpus, among other things, as a new gold standard for training and testing PoS taggers. We describe the individual phases of the corpus construction, i.e. text selection and cleaning, sentence segmentation an...
Ten years ago, the Icelandic government launched a special Language Technology Program with the aim of supporting institutions and companies in creating basic resources for Icelandic language technology work. This initiative resulted in the creation and development of several important resources and tools that have had profound influence on Iceland...
Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources. Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt. NEALT Proceedings Series, Vol. 5 (2009), i-ii. © 2009 The editors and contributors. Published by Northern European A...
Context-sensitive spelling correction is the task of correcting spelling errors which result in valid words. We present work in progress where we adapt established methods from English to a morphologi-cally rich language and conclude that the rich morphology negatively affects perfor-mance. However, our system is still good enough to be useful in r...
Previous work on part-of-speech (PoS)
tagging Icelandic has shown that the morphological complexity of the language
poses considerable difficulties for PoS taggers. In this paper, we increase the tagging accuracy of Icelandic text by using two
methods. First, we present a new tagger,
by integrating an HMM tagger into a linguistic rule-based tagger....
We describe the background for and building of IcePaHC, a one million word parsed historical corpus of Icelandic which has just been finished. This corpus which is completely free and open contains fragments of 60 texts ranging from the late 12 th century to the present. We describe the text selection and text collecting process and discuss the qua...
We present a new mixed method lemmatizer for Icelandic, Lemmald, which achieves good performance by relying on IceTagger [1] for tagging and The Icelandic Frequency Dictionary [2] corpus
for training. We combine the advantages of data-driven machine learning with linguistic insights to maximize performance.
To achieve this, we make use of a novel a...
We describe experiments with morphosyntactic tagging of Old Norse narrative texts using different tagging models for the TnT tagger (Brants, 2000) and a tagset of almost 700 tags. It is shown that by using a model that has been trained on both Modern Icelandic texts and Old Norse texts, we can get 92.65% tagging accuracy which is considerably bette...
We describe the establishment and development of Icelandic language technology since its very beginning ten years ago. The ground was laid with a report from a committee appointed by the Minister of Education, Science and Culture in 1998. In this report, which was delivered in the spring of 1999, the committee proposed several actions to establish...
Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, the challenges faced and the decisions made during d...
We describe and evaluate an incremental finite-state parser for Icelandic – the first parser published for the language. Input
to the parser is POS tagged text and it generates output according to a shallow syntactic annotation scheme, specifically designed for this project. The parser consists of a phrase structure module and a
syntactic functions...
This paper discusses the use and historical development of the combination sjálfur ‘self’ + possessive pronoun. It is shown that in Old Icelandic, the possessive pronoun agreed in gender, number, and case with the noun that it modified, whereas sjálfur stood in the genitive, agreeing in number and case with its antecedent. Already in Old Icelandic,...
The purpose of this paper is to try to trace the origins of the expletive það „it, there‟ in Icelandic. The first section is an overview of different types of sentences beginning with það in Modern Icelandic, and thus provides a necessary background for this study. It has usually been claimed that Old Icelandic did not have any expletives (Smári 19...
It is a fairly standard assumption that in imperatives, as in yes/no-questions and Narrative Inversion, the finite verb is in C, and an operator of the relevant type is hosted in Spec-CP. Therefore, no lexical phrase can be moved to Spec-CP, and hence these sentence types will be verb-initial on the surface. This analysis also explains why these se...
Abstract The purpose of this paper is to describe and account for the word order variation found inthe VP in Old Icelandic. It is shown,that even though the IP in Old Icelandic was clearly head-initial, with movement of the finite verb to I 0
The purpose of this paper is to show that Old Icelandic must be assumed to have had oblique (quirky) subjects. However, the author emphasizes that he is mainly concerned with syntactic change, not the subject or non-subject status of certain pre-verbal oblique NPs. His arguments are meant to show that the NPs in question behave in exactly the same...
This paper discusses the function and meaning of the word nema `unless, except, but'. Most grammar books classify this word as a subordinate (conditional) conjunction, and describe it as the "negative counterpart" of ef `if'. However, the author argues that the meaning of nema is not `if not'; nema always implies or introduces an exception from the...
This paper presents a detailed analysis of the behavior of the Icelandic conjunction enda. It turns out to be possible to explain all the puzzling facts about this conjunction by reinterpreting the traditional division between coordination and subordination by breaking it up into two binary features, [±CO] and [±SUB]. Furthermore, it is argued that...
In this paper I will make some comments on Anderson’s and Maling’s papers in this volume on reflexivization in Icelandic. I will show how that Anderson’s theory makes some predictions which are not borne out by the facts, while Maling’s notion of predication often makes it difficult to see what the predictions of her theory actually are. Occasional...