Ilze Auziņa

Ilze Auziņa
University of Latvia | LU · Institute of Mathematics and Computer Science

Dr. philol.

About

42
Publications
6,660
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
181
Citations
Additional affiliations
Position
  • methodical
January 2012 - present
Faculty of Humanities - University of Latvia
Position
  • Lecturer
April 2008 - present
Institute of Mathematics and Computer Science
Position
  • Senior Researcher
Education
October 1998 - September 2001
University of Latvia
Field of study
  • Philology
September 1996 - June 1998
University of Latvia, Faculty of Philology
Field of study
  • Linguistics
September 1992 - June 1996
University of Latvia, Faculty of Philology
Field of study
  • Baltic Philology

Publications

Publications (42)
Conference Paper
Full-text available
Open speech corpora of substantial size are seldom available for less-spoken languages, and this was recently the case also for Latvian with its 1.5M native speakers. While there exist several closed Latvian speech corpora of 100+ hours, used to train competitive models for automatic speech recognition (ASR), there were only a few tiny open dataset...
Article
Full-text available
Rakstā aplūkotas korpusa datu un individuālā vākuma priekšrocības un trūkumi. Dažādu gramatiski marķētu latviešu valodas korpusu pieejamība nodrošina arvien plašākus korpusa datos balstītus gramatikas pētījumus. Savukārt individuālajam vākumam ir bijusi liela nozīme valodniecības attīstībā, un tas ir senāks praktiskā materiāla ieguves veids. Tomēr...
Chapter
Full-text available
Ten years ago, when META-NET conducted a study on Language Technology support for Europe’s languages, Latvian was assessed as a language with little or no support (Skadiņa et al. 2012). During the last decade, progress has been made in the development of language resources and tools for Latvian, particularly with respect to advanced datasets and la...
Article
The concept of correlations as an indicator of cross-compliance and relations has been widely used not only in natural sciences and in engineering. It is also increasingly viewed in the sub-sectors of social sciences and the humanities, to describe the interrelation between phenomena, concepts, events, etc. In this study, correlation is used as a m...
Conference Paper
Full-text available
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
Chapter
The articles in the internationally and anonymously reviewed collection are based on the papers developed within the subproject No. 8 “Latvian Language Acquisition” framework of the National Research Programme “Latvian Language” Nr. VPP-IZM-2018/2-0002 and presented at the sections of the XIII International Congress of Balticists. The collection is...
Chapter
Full-text available
This paper describes an ongoing work on the creation of Latvian language resources for the medical domain focusing on digital imaging to develop a medical speech recognition system for Latvian. The language resources include a pronunciation lexicon, a text corpus for language modelling, and an orthographically transcribed speech corpus for the (i)...
Article
Full-text available
This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation sche...
Article
Full-text available
Qualitative and reliable language resources and natural language processing tools are key elements for research in digital humanities (DH). Several research infrastructures, e.g., CLARIN, DARIAH, provide access to the digital research objects around Europe and beyond. Although these are pan-European research infrastructures, availability of content...
Article
Full-text available
Popularity of learning Latvian as a foreign language is increasing. Latvian as a foreign language is being taught not only in the higher educational institutions of Latvia, but also in more than 20 universities outside Latvia (Šalme 2008; Šalme 2011; Laizāne 2019). Therefore, corpus-based and corpus-driven teaching materials are crucial for the int...
Preprint
Full-text available
The paper presents quality focused approach to a learner corpus development. The methodology was developed with multiple design considerations put in place to make the annotation process easier and at the same time reduce the amount of mistakes that could be introduced due to inconsistent text correction or carelessness. The approach suggested in t...
Article
Full-text available
Apguvēju korpuss ir sistemātiski datorizētu valodas apguvēju (gan svešvalodas, gan otrās valodas) veidotu tekstu datubāze. Tas ir ārvalstnieku valodas apguvēju īpatnību izpētes un datos balstītu latviešu valodas mācību materiālu un metodisko līdzekļu izstrādes pamats. Apguvēju korpusu, tāpat kā citus valodas korpusus, var marķēt dažādos valodas līm...
Conference Paper
Full-text available
This paper describes a release of corpus of the Saeima (parliament of Latvia) as open data resources for multidisciplinary research. The corpus consists of the transcription of Latvian parliamentary debates from 1993 until 2017, containing 38 million tokens from 468 speakers. Current comparative research of parliamentary debate is not sufficiently...
Conference Paper
Full-text available
This article presents a different method for creation of error annotated corpora. The approach suggested in this paper consists of multiple parts-text correction, automated morphological analysis, automated text alignment and error annotation. Error annotation can easily be semi-automated with a rule-based system, similar to the one used in this pa...
Conference Paper
Full-text available
This article presents a different method for creation of error annotated corpora. The approach suggested in this paper consists of multiple parts-text correction, automated morphological analysis, automated text alignment and error annotation. Error annotation can easily be semi-automated with a rule-based system, similar to the one used in this pa...
Conference Paper
We present the results of the Latvian IT Competence Centre (IT CC) in developing several essential language technologies and applications. 11 language technology projects have been completed in the first phase of the IT CC work. We describe how IT CC has contributed to filling in the gaps and improving the quality of the basic language technologies...
Article
Full-text available
Latvian is a highly inflective language with rather free word order. In general, the unmarked (i. e., the most common) order of elements in a sentence is SVO, however, OVS, SOV, OSV are possible and grammatically correct. Data from the Latvian Valency Lexicon was used to analyse the word order models in Latvian. The paper, first of all, provides an...
Conference Paper
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Conference Paper
Full-text available
In this paper the authors present a speech corpus designed and created for the development and evaluation of dictation systems in Latvian. The corpus consists of over nine hours of orthographically annotated speech from 30 different speakers. The corpus features spoken commands that are common for dictation systems for text editors. The corpus is e...
Book
Full-text available
Metodiskajā līdzeklī „Latviešu valodas prasmes līmeņi: pamatlīmenis A1, A2, vidējais līmenis B1, B2” aprakstīts latviešu valodas apguves process un valodas lietojums atbilstoši Eiropas Padomes izstrādātajai Breakthrough (A1), Waystage (A2), Threshold (B1) un Vantage (B2) satura specifikācijai. Izdevumā dots valodas prasmes līmeņu vispārīgs apraksts...
Article
The paper provides an overview of organization principles of interactive distance teaching as well as its implementation for continuous education course for bilingual education teachers of 6 programme’s subjects. Distance teaching sessions were organized in 10 school’s computer classes. There are considered possible solutions for interactive distan...
Conference Paper
Full-text available
Although human language technologies have a long history in Latvia, the Latvian language still belongs to under-resourced languages, as there are many gaps in basic language technologies and tools. However, despite difficulties, some of these gaps for both, resources and tools, have been filled in the last five years. The main goal of this paper is...
Conference Paper
Full-text available
Grapheme to phoneme modelling is one of the key features in automated speech recognition and speech synthesis. In this paper, the authors compare two different approaches: a statistical machine translation based method using the phonetically transcribed Latvian Speech Recognition Corpus and a rule-based method for phonetic transcription of words fr...
Conference Paper
Full-text available
In this paper the authors present the first Latvian speech corpus designed specifically for speech recognition purposes. The paper outlines the decisions made in the corpus designing process through analysis of related work on speech corpora creation for different languages. The authors provide also guidelines that were used for the creation of the...
Article
Full-text available
The paper describes a work in progress of building a catalogue of named entities-people, places and organizations-based on a recently digitized large (4.5 billion tokens) Latvian corpus. The authors propose an annotation standard for markup of named entities within Latvian corpus, according to which a representative set of documents (150 000 words)...
Conference Paper
The last six years have been very important for research and development of language technologies in Latvia. Several large projects have been funded by the government of Latvia, important tools and resources have been created by the industry, and since 2006 Latvia has participated in the CLARIN initiative. Although there is still a gap in language...
Conference Paper
The aim of this paper is to present the development stages of Spoken Latvian Corpus and the current situation of Spoken Latvian Corpus. The development of Spoken Latvian Corpus has already begun in 2006 (Latvian Council of Science funding), and some individual speech corpora are developed. There are several stages in the creation of Spoken Latvian...
Conference Paper
The paper describes the development of the Latvian Text-to Speech Synthesizer at the Institute of Mathematics and Computer Science (University of Latvia).
Conference Paper
Full-text available
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Conference Paper
Full-text available
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...

Network

Cited By