
Lauma PretkalniņaUniversity of Latvia | LU · Institute of Mathematics and Computer Science
Lauma Pretkalniņa
MSc
About
31
Publications
4,406
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
150
Citations
Citations since 2017
Introduction
Additional affiliations
Education
September 2009 - June 2011
University of Latvia
Field of study
- Computer Science
Publications
Publications (31)
This paper deals with the Corpus of early written Latvian and explains the methodology for normalising historical spellings found in texts from the 16 th-18 th cc. It describes the types of replacements which will make searching early texts more convenient.
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
The aim of the article is to describe in brief previous experiences with preparing early written Latvian texts for publication in recent times; we review the first attempts carried out by a working group dealing with the modernization of the Corpus of early written Latvian, examine the process and results of the conversion of early texts into moder...
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural langu...
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive
text summarization and knowledge base population, which are required by the project industri...
Analysis on first conjugation verbs in the online dictionary Tēzaurs.lv (in Latvian).
In this paper we present the first Universal Dependency Treebank for Latvian. Latvian UD Treebank contains approx. 1 thousand sentences. It has been created from Latvian Treebank newswire texts with the help of an automatic conversion. This resource is an important prerequisite for integrating Latvian in various international language processing fr...
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Poster for University of Latvia Conference 2015, Section of Computational Linguistics.
In this paper, we analyze the impact of various dependency representations for various constructions on the general parsing accuracy and on the parsing accuracy of these constructions. We focus on the analysis of coordination constructions, complex predicates, and punctuation mark attachment. We use Latvian Treebank as a dataset, thus, providing in...
In this paper we investigate how different dependency representations of a treebank influence the accuracy of the dependency parser trained on this treebank and the impact on several parser applications: named entity recognition, coreference resolution and limited semantic role labeling. For these experiments we use Latvian Treebank, whose native a...
Poster for University of Latvia Conference 2014, Section of Computational Linguistics.
Syntactic parsing is an important technique in the natural language processing, yet Latvian is still lacking an efficient general coverage syntax parser. This paper reports on the first experiments on statistical syntactic parsing for Latvian — a highly inflective Indo-European language with a relatively free word order. We have induced a statistic...
We describe an approach for morphological analysis combining a rule-based word level morphological analyzer with statistical tagging, detailing its application to Latvian language. Latvian is a highly inflective Indo-European language with a rich morphology. The tools described here include an implementation of Latvian inflectional paradigms, a mor...
The Latvian Treebank is being developed since 2010. In this paper we
describe the latest developments of this project and the problems currently faced.
We examine several gaps in our annotation scheme like determinant, ellipsis and
insertion annotation and describe solutions we have chosen.
In this paper we describe an ongoing work developing a system (a set of web-services) for transliterating the Gothic-based Fraktur script of historical Latvian to the Latin-based script of contemporary Latvian. Currently the system consists of two main components: a generic transliteration engine that can be customized with alternative sets of rule...
Anotacija
DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTS
Summary
This paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Trans...
In this paper the preparatory work for Latvian Treebank has been
examined. We use the SemTi-Kamols dependency based hybrid grammar
model integrated with the TrEd toolkit originally developed for Prague
Dependency Treebank. We have notably extended and adapted both the
SemTi-Kamols model and TrEd to fit them to our needs. As the result the
smal...
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
This paper is an attempt to discover the main challenges in working with Baltic and Estonian languages, and to identify the most significant sources of errors generated by a SMT system trained on large-vocabulary parallel corpora from legislative domain. An immense distinction between Latvian/Lithuanian and Estonian languages causes a set of non-eq...
Nonconstructive finite automata were first considered by R.
Freivalds. We prove tight upper bound for amount of nonconstructivity
that can be needed to recognize a language. We prove some
theorems about saving amount of the nonconstructive help needed by
encoding that information in automata. We also show that nonconstructive
probabilistic automata...
This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N-gram-based SMT systems. The major diffe...
This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list resc...
Projects
Projects (2)