Tommi A. Pirinen

Tommi A. Pirinen
UiT The Arctic University of Norway · Faculty of Humanities, Social Sciences and Education

PhD

About

33
Publications
8,119
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
345
Citations
Introduction
I do not use researchgate actively, please check better alternatinves like orcid or semantic scholar or google scholar.
Additional affiliations
March 2014 - March 2016
Dublin City University
Position
  • PostDoc Position
January 2010 - December 2012
University of Helsinki
Position
  • PhD Student
Description
  • Weighted Finite-State Automata in Spell-Checking and Correction

Publications

Publications (33)
Article
Full-text available
This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional m...
Chapter
This is the Festschrift of Dr. Jack Rueter. The book presents peer-reviewed scientific work from Dr. Rueter’s colleagues related to the latest advances in natural language processing, digital resources and endangered languages in a variety of languages such as historical English, Chukchi, Mansi, Erzya, Komi, Finnish, Apurina, Sign Languages, Sami l...
Article
Full-text available
In this introduction we have tried to present concisely the history of language technology for Uralic languages up until today, and a bit of a desiderata from the point of view of why we organised this special issue. It is of course not possible to cover everything that has happened in a short introduction like this. We have attempted to cover the...
Conference Paper
We describe our participation in TweetMT for three language pairs in both directions: Spanish from/to Catalan, Basque and Portuguese. We used a range of techniques: statistical and rule-based MT, morph segmentation, data selection with ParFDA and system combination. As for resources, our focus was on crawling vast amounts of tweets to perform monol...
Article
This paper provides an overview of the research and development activities carried out to alleviate the language resources' bottleneck in machine translation within the Abu-MaTran project. We have developed a range of tools for the acquisition of the main resources required by the two most popular approaches to machine translation, i.e. statistical...
Conference Paper
Full-text available
This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish‐English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish language by (i) crawling parallel and monolingual data from the Web and (ii) applying rule-based and unsupervised method...
Article
This article describes a contemporary system for the computational modelling of the morphology of Finnish word-forms called Omorfi. The purpose of this article is to present new developments and an open development model of the morphological analysis of Finnish to the linguistic audience. The article shows Omorfi as a full-fledged, stable system fo...
Conference Paper
The following claims can be made about finite-state methods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by...
Conference Paper
Full-text available
The paper presents and evaluates various NLP tools that have been created using the open source library HFST - Helsinki Finite-State Technology and outlines the minimal extensions that this has required to a pure finite-state system. In particular, the paper describes an implementation and application of Pmatch presented by Karttunen at SFCM 2011.
Article
This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries, thereby creating efficient reading comprehension dictionaries. We compare a North Saami - Norwegian and a South Saami - Norwegian dictionary, both enriched with an FST, with existing, available dictionaries containing pre-generated paradigms...
Article
Full-text available
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, m...
Article
Full-text available
We describe a predictive text entry system for Finnish combining an open source morphological analyzer Omorfi and a lexical model compiled from Internet Relay Chat (IRC) logs. The system is implemented as a weighted finite-state transducer (WFST) using the freely available WFST library HFST. We show that using IRC logs to train the system gives sub...
Article
Full-text available
HFST-HelsinkiFinite-StateTechnology (http://hfst.sf.net/) is a framework for compiling and applying linguistic descriptions with finitestatemethods. HFST currently collects some of the most important finite-state tools for creatingmorphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with...
Conference Paper
Full-text available
HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weigh...
Article
Full-text available
In this paper we present an open source implementation for Finnish morphological parser. We shortly evaluate it against con-temporary criticism towards monolithic and unmaintainable finite-state language description. We use it to demonstrate way of writing finite-state language description that is used for varying set of projects, that typically ne...
Conference Paper
Full-text available
There are numerous formats for writing spell-checkers for open-source systems and there are many descriptions for languages written in these formats. Similarly, for word hyphenation by computer there are TEX rules for many languages. In this paper we demonstrate a method for converting these spell-checking lexicons and hyphenation rule sets into fi...
Article
Full-text available
There are numerous formats for writing spell-checkers for open-sour-ce systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus tra...
Article
Full-text available
In a language with very productive compounding and a rich inflec-tional system, e.g. Finnish, new words are to a large extent formed by compounding. In order to disambiguate between the possible compound segmentations, a probabilistic strategy has been found effective by Lindén and Pirinen [7]. In this article, we present a method for implementing...
Article
Full-text available
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, m...
Conference Paper
Full-text available
Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application w...
Article
Full-text available
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 89-95. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Ta...

Network

Cited By

Projects

Projects (2)