• Home
  • Stig-Arne Grönroos
Stig-Arne Grönroos

Stig-Arne Grönroos
Silo.AI

Doctor of Science (Technology)

About

26
Publications
3,969
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
332
Citations
Introduction
Neural machine translation into morphologically rich low-resource languages. The methods focus on subword segmentation and transfer learning.
Additional affiliations
October 2020 - present
Silo.AI
Position
  • Researcher
Description
  • Research and development of machine learning solutions in natural language processing and automatic speech recognition.
January 2013 - January 2021
Aalto University
Position
  • PhD Student
Description
  • Part of the Speech and Language Processing group. My research topic is improving machine translation into low-resource morphologically complex languages. I have used unsupervised, semi-supervised, active, and transfer learning. I developed, implemented and evaluated several machine learning methods for machine translation and subword segmentation, including four new Morfessor methods. My responsibilities include teaching in the course Statistical NLP and supervising of M.Sc. theses.
September 2011 - December 2012
Finnish Meteorological Institute
Position
  • Research Assistant
Description
  • I was part of the Space Weather research group, which studies phenomena in the solar wind and the magnetic fields of astronomical objects. I developed systems for automatically identifying several types of scientifically interesting events from magnetometer and solar image data. The work involved pattern recognition, image processing, signal processing, and statistics.
Education
April 2014 - January 2021
Aalto University
Field of study
  • Language Technology
September 2012 - May 2014
Aalto University
Field of study
  • Information and Computer Science

Publications

Publications (26)
Preprint
Full-text available
This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the devel...
Preprint
Full-text available
This paper provides the system description of "Silo NLP's" submission to the Workshop on Asian Translation (WAT2022). We have participated in the Indic Multimodal tasks (English->Hindi, English->Malayalam, and English->Bengali Multimodal Translation). For text-only translation, we trained Transformers from scratch and fine-tuned mBART-50 models. Fo...
Article
Full-text available
There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be...
Article
Full-text available
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio an...
Preprint
There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be...
Preprint
Full-text available
Data-driven segmentation of words into subword units has been used in various natural language processing applications such as automatic speech recognition and statistical machine translation for almost 20 years. Recently it has became more widely adopted, as models based on deep neural networks often benefit from subword units even for morphologic...
Preprint
Full-text available
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio an...
Conference Paper
Full-text available
Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative models , is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We...
Preprint
Full-text available
This article describes the Aalto University entry to the WMT18 News Translation Shared Task. We participate in the multilingual subtrack with a system trained under the constrained condition to translate from English to both Finnish and Es-tonian. The system is based on the Transformer model. We focus on improving the consistency of morphological s...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neu-ral machine translation (NMT) architecture to a multi-modal setting. In this paper , we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top...
Preprint
Full-text available
This paper describes the MeMAD project entry to the IWSLT Speech Translation Shared Task, addressing the translation of English audio into German text. Between the pipeline and end-to-end model tracks, we participated only in the former, with three contrastive systems. We tried also the latter, but were not able to finish our end-to-end model in ti...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top s...
Preprint
Full-text available
This article describes the Aalto University entry to the WMT18 News Translation Shared Task. We participate in the multilingual subtrack with a system trained under the constrained condition to translate from English to both Finnish and Estonian. The system is based on the Transformer model. We focus on improving the consistency of morphological se...
Article
Full-text available
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistic...
Article
This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small num...
Conference Paper
Full-text available
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statist...

Network

Cited By