Kristīne Levāne-Petrova

Kristīne Levāne-Petrova
University of Latvia | LU · Institute of Mathematics and Computer Science

PhD

About

30
Publications
3,306
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
103
Citations

Publications

Publications (30)
Conference Paper
Full-text available
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
Article
Full-text available
Verba kārta un ar to saistītie jautājumi vienmēr ir bijis aktuāls gramatikas izpētes objekts, kas bijis daudzu valodnieku interešu lokā. Arī latviešu valodā verba kārta, jo īpaši ciešamā kārta, ir aplūkota dažādās latviešu valodas gramatikās, tai ir arī veltīts daudz atsevišķu pētījumu. Lai arī latviešu valodai ir daudz aprakstu un pētījumu par cie...
Chapter
The articles in the internationally and anonymously reviewed collection are based on the papers developed within the subproject No. 8 “Latvian Language Acquisition” framework of the National Research Programme “Latvian Language” Nr. VPP-IZM-2018/2-0002 and presented at the sections of the XIII International Congress of Balticists. The collection is...
Chapter
Full-text available
The articles in the internationally and anonymously reviewed collection are based on the papers developed within the subproject No. 8 “Latvian Language Acquisition” framework of the National Research Programme “Latvian Language” Nr. VPP-IZM-2018/2-0002 and presented at the sections of the XIII International Congress of Balticists. The collection is...
Chapter
Full-text available
This paper describes lessons learned from developing the most recent Balanced Corpus of Modern Latvian (LVK2018) from various online sources. Most of the new corpora are created from data obtained from various text holders, which requires cooperation agreements with each of the text holders. Reaching these cooperation agreements is a difficult and...
Article
Full-text available
This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation sche...
Article
Full-text available
Popularity of learning Latvian as a foreign language is increasing. Latvian as a foreign language is being taught not only in the higher educational institutions of Latvia, but also in more than 20 universities outside Latvia (Šalme 2008; Šalme 2011; Laizāne 2019). Therefore, corpus-based and corpus-driven teaching materials are crucial for the int...
Preprint
Full-text available
The paper presents quality focused approach to a learner corpus development. The methodology was developed with multiple design considerations put in place to make the annotation process easier and at the same time reduce the amount of mistakes that could be introduced due to inconsistent text correction or carelessness. The approach suggested in t...
Article
Full-text available
Apguvēju korpuss ir sistemātiski datorizētu valodas apguvēju (gan svešvalodas, gan otrās valodas) veidotu tekstu datubāze. Tas ir ārvalstnieku valodas apguvēju īpatnību izpētes un datos balstītu latviešu valodas mācību materiālu un metodisko līdzekļu izstrādes pamats. Apguvēju korpusu, tāpat kā citus valodas korpusus, var marķēt dažādos valodas līm...
Conference Paper
Full-text available
This article presents a different method for creation of error annotated corpora. The approach suggested in this paper consists of multiple parts-text correction, automated morphological analysis, automated text alignment and error annotation. Error annotation can easily be semi-automated with a rule-based system, similar to the one used in this pa...
Conference Paper
Full-text available
This article presents a different method for creation of error annotated corpora. The approach suggested in this paper consists of multiple parts-text correction, automated morphological analysis, automated text alignment and error annotation. Error annotation can easily be semi-automated with a rule-based system, similar to the one used in this pa...
Article
Full-text available
The aim of this article is to analyze passive constructions in Latvian and Lithuanian. The correspondences of the Lithuanian past passive and present passive participles as the predicates (that together with the auxiliary constitute the passive forms in Lithuanian) in Latvian were analyzed in the Lithuanian-Latvian-Lithuanian parallel corpus (LiLa)...
Article
Full-text available
Latvian is a highly inflective language with rather free word order. In general, the unmarked (i. e., the most common) order of elements in a sentence is SVO, however, OVS, SOV, OSV are possible and grammatically correct. Data from the Latvian Valency Lexicon was used to analyse the word order models in Latvian. The paper, first of all, provides an...
Article
Full-text available
Anotacija THE BALANCED CORPUS OF MODERN LATVIAN AND THE TEXT SELECTION CRITERIA Summary Recently The Balanced Corpus of Modern Latvian (~3.5 million running words) has been created in the Institute of Mathematics and Computer Science (IMCS) (see http://www.korpuss.lv). The Corpus has been compiled from printed and electronic materials created afte...
Conference Paper
Full-text available
Statistical machine translation (SMT) is a hot research topic not only for languages with large quantities of parallel language resources available, but also for under-resourced languages, including languages of Baltic countries. Evaluation of SMT systems in Baltic countries is mostly done by using automatic metrics. In this paper we present a ling...
Conference Paper
In this paper the preparatory work for Latvian Treebank has been examined. We use the SemTi-Kamols dependency based hybrid grammar model integrated with the TrEd toolkit originally developed for Prague Dependency Treebank. We have notably extended and adapted both the SemTi-Kamols model and TrEd to fit them to our needs. As the result the smal...
Conference Paper
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
Conference Paper
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
Conference Paper
The last six years have been very important for research and development of language technologies in Latvia. Several large projects have been funded by the government of Latvia, important tools and resources have been created by the industry, and since 2006 Latvia has participated in the CLARIN initiative. Although there is still a gap in language...
Conference Paper
Full-text available
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Conference Paper
Full-text available
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...

Network

Cited By