Technical ReportPDF Available

Language Technology for Icelandic 2018-2022

Authors:

Abstract

This report is the conclusion of a workgroup, which the steering group assembled to evaluate the status of Icelandic language technology and to create a five-year project plan. The workgroup suggests that four open core-solutions are created: Speech Recogniser; Speech Synthesiser; Machine Translation System; and Spell and Grammar Checker. These will be developed and adapted to Icelandic to the point where they are fully usable, and used by the public, companies, governmental bodies, and institutions in Iceland. The principal prerequisite in building language technology tools is to have language resources and support tools in place, and the report describes the measures required to achieve this. In addition to the project schedule, suggestions are made as to how it will be carried out, with a description of how similar schedules have been executed in other countries.
A preview of the PDF is not available
Chapter
Full-text available
In 1999, the Dutch Language Union, a binational intergovernmental organisation has created the HLT Platform as a means to start an exchange of plans and policy initiatives amongst government officials of Flanders and the Netherlands concerning human language technology for Dutch. During the past decade, the cooperation between these partners has intensified over time and culminated in the definition and execution of a joint R&D programme, called STEVIN. This paper summarises the past decade of (major) HLTD policy initiatives that led to the creation of the STEVIN in the Low Countries. It sketches the institutional framework in which the STEVIN-programme flourished. In addition, it provides the main conclusions of the evaluation the STEVIN programme. The external evaluators qualified the programme as successful.
Article
Full-text available
Children with speech disorders often present with systematic speech error patterns. In clinical assessments of speech disorders, evaluating the severity of the disorder is central. Current measures of severity have limited sensitivity to factors like the frequency of the target sounds in the child's language and the degree of phonological diversity, which are factors that can be assumed to affect intelligibility. By constructing phonological filters to simulate eight speech error patterns often observed in children, and applying these filters to a phonologically transcribed corpus of 350K words, this study explores three quantitative measures of phonological impact: Percentage of Consonants Correct (PCC), edit distance, and degree of homonymy. These metrics were related to estimated ratings of severity collected from 34 practicing clinicians. The results show an expected high correlation between the PCC and edit distance metrics, but that none of the three metrics align with clinicians' ratings. Although these results do not generate definite answers to what phonological factors contribute the most to (un)intelligibility, this study demonstrates a methodology that allows for large-scale investigations of the interplay between phonological errors and their impact on speech in context, within and across languages.
Article
Full-text available
This article describes the construction and performance of Granska - a surface-oriented system for grammar checking of Swedish text. With the use of carefully constructed error detection rules, written in a new structured rule language, the system can detect and suggest corrections for a number of grammatical errors in Swedish texts. In this article, we specifically focus on how erroneously split compounds and disagreement are handled in the rules. The system combines probabilistic and rule-based methods to achieve high eciency and robustness. The error detection rules are optimized using statistics of part-of-speech bigrams and words in a way that each rule needs to be checked as seldom as possible. We have found that the Granska system with higher eciency can achieve the same or
Conference Paper
Full-text available
We have designed, implemented and evaluated an end-to-end system spellchecking and autocorrection system that does not require any manually annotated training data. The World Wide Web is used as a large noisy corpus from which we infer knowledge about misspellings and word usage. This is used to build an error model and an n-gram language model. A small secondary set of news texts with artificially inserted misspellings are used to tune confidence classifiers. Because no manual annotation is required, our system can easily be instantiated for new languages. When evaluated on human typed data with real misspellings in English and German, our web-based systems outperform baselines which use candidate corrections based on hand-curated dictionaries. Our system achieves 3.8% total error rate in English. We show similar improvements in preliminary results on artificial data for Russian and Arabic.
Article
We introduce an open-source toolkit for neural machine translation (NMT) to support research into model architectures, feature representations, and source modalities, while maintaining competitive performance, modularity and reasonable training requirements.
Conference Paper
Current eye-tracking input systems for people with ALS or other motor impairments are expensive, not robust under sunlight, and require frequent re-calibration and substantial, relatively immobile setups. Eye-gaze transfer (e-tran) boards, a low-tech alternative, are challenging to master and offer slow communication rates. To mitigate the drawbacks of these two status quo approaches, we created GazeSpeak, an eye gesture communication system that runs on a smartphone, and is designed to be low-cost, robust, portable, and easy-to-learn, with a higher communication bandwidth than an e-tran board. GazeSpeak can interpret eye gestures in real time, decode these gestures into predicted utterances, and facilitate communication, with different user interfaces for speakers and interpreters. Our evaluations demonstrate that GazeSpeak is robust, has good user satisfaction, and provides a speed improvement with respect to an e-tran board; we also identify avenues for further improvement to low-cost, low-effort gaze-based communication technologies.
Conference Paper
This paper describes issues involved in the development of a grammar checker in multiple languages at Microsoft Corporation. Focus is on design (selecting and prioritizing error identification rules) and evaluation (determining product quality).
Article
Research aimed at correcting words in text has focused on three progressively more difficult problems:(1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the second problem, a variety of general and application-specific spelling correction techniques have been developed. Some of them were based on detailed studies of spelling error patterns. In response to the third problem, a few experiments using natural-language-processing tools or statistical-language models have been carried out. This article surveys documented findings on spelling error patterns, provides descriptions of various nonword detection and isolated-word error correction techniques, reviews the state of the art of context-dependent word correction techniques, and discusses research issues related to all three areas of automatic error correction in text.
The architecture of the Festival Speech Synthesis System
  • P Taylor
  • A Black
  • R Caley
• Taylor, P., Black, A. & Caley, R. 1998. The architecture of the Festival Speech Synthesis System. Proc. 3rd ESCA Workshop on Speech Synthesis, bls. 147-151, Jenolan Caves, Australia.
An audience response system-based approach to speech synthesis evaluation
  • Christina Tånnander
• Tånnander, Christina. 2012. An audience response system-based approach to speech synthesis evaluation. In The Fourth Swedish Language Technology Conference (SLTC 2012), pg. 74-75. Lund, Sweden.