Lauma Pretkalniņa

Lauma Pretkalniņa
University of Latvia | LU · Institute of Mathematics and Computer Science

MSc

About

31
Publications
4,406
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
150
Citations
Citations since 2017
7 Research Items
121 Citations
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
Additional affiliations
June 2006 - present
University of Latvia
Position
  • Researcher
Description
  • I am working on Latvian Treebank, developing tools and training parsers. In the past I have worked with legacy dictionary transformation into machine readable format and developed transliteration solution for legacy texts.
Education
September 2009 - June 2011
University of Latvia
Field of study
  • Computer Science

Publications

Publications (31)
Conference Paper
Full-text available
This paper deals with the Corpus of early written Latvian and explains the methodology for normalising historical spellings found in texts from the 16 th-18 th cc. It describes the types of replacements which will make searching early texts more convenient.
Conference Paper
Full-text available
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
Article
The aim of the article is to describe in brief previous experiences with preparing early written Latvian texts for publication in recent times; we review the first attempts carried out by a working group dealing with the modernization of the Corpus of early written Latvian, examine the process and results of the conversion of early texts into moder...
Conference Paper
Full-text available
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural langu...
Conference Paper
Full-text available
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive text summarization and knowledge base population, which are required by the project industri...
Presentation
Full-text available
Analysis on first conjugation verbs in the online dictionary Tēzaurs.lv (in Latvian).
Conference Paper
Full-text available
In this paper we present the first Universal Dependency Treebank for Latvian. Latvian UD Treebank contains approx. 1 thousand sentences. It has been created from Latvian Treebank newswire texts with the help of an automatic conversion. This resource is an important prerequisite for integrating Latvian in various international language processing fr...
Conference Paper
Full-text available
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Data
Full-text available
Poster for University of Latvia Conference 2015, Section of Computational Linguistics.
Conference Paper
Full-text available
In this paper, we analyze the impact of various dependency representations for various constructions on the general parsing accuracy and on the parsing accuracy of these constructions. We focus on the analysis of coordination constructions, complex predicates, and punctuation mark attachment. We use Latvian Treebank as a dataset, thus, providing in...
Conference Paper
Full-text available
In this paper we investigate how different dependency representations of a treebank influence the accuracy of the dependency parser trained on this treebank and the impact on several parser applications: named entity recognition, coreference resolution and limited semantic role labeling. For these experiments we use Latvian Treebank, whose native a...
Data
Full-text available
Poster for University of Latvia Conference 2014, Section of Computational Linguistics.
Conference Paper
Full-text available
Syntactic parsing is an important technique in the natural language processing, yet Latvian is still lacking an efficient general coverage syntax parser. This paper reports on the first experiments on statistical syntactic parsing for Latvian — a highly inflective Indo-European language with a relatively free word order. We have induced a statistic...
Conference Paper
Full-text available
We describe an approach for morphological analysis combining a rule-based word level morphological analyzer with statistical tagging, detailing its application to Latvian language. Latvian is a highly inflective Indo-European language with a rich morphology. The tools described here include an implementation of Latvian inflectional paradigms, a mor...
Conference Paper
Full-text available
The Latvian Treebank is being developed since 2010. In this paper we describe the latest developments of this project and the problems currently faced. We examine several gaps in our annotation scheme like determinant, ellipsis and insertion annotation and describe solutions we have chosen.
Conference Paper
Full-text available
In this paper we describe an ongoing work developing a system (a set of web-services) for transliterating the Gothic-based Fraktur script of historical Latvian to the Latin-based script of contemporary Latvian. Currently the system consists of two main components: a generic transliteration engine that can be customized with alternative sets of rule...
Article
Full-text available
Anotacija DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTS Summary This paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Trans...
Conference Paper
Full-text available
In this paper the preparatory work for Latvian Treebank has been examined. We use the SemTi-Kamols dependency based hybrid grammar model integrated with the TrEd toolkit originally developed for Prague Dependency Treebank. We have notably extended and adapted both the SemTi-Kamols model and TrEd to fit them to our needs. As the result the smal...
Conference Paper
Full-text available
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
Conference Paper
Full-text available
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
Conference Paper
Full-text available
This paper is an attempt to discover the main challenges in working with Baltic and Estonian languages, and to identify the most significant sources of errors generated by a SMT system trained on large-vocabulary parallel corpora from legislative domain. An immense distinction between Latvian/Lithuanian and Estonian languages causes a set of non-eq...
Conference Paper
Full-text available
Nonconstructive finite automata were first considered by R. Freivalds. We prove tight upper bound for amount of nonconstructivity that can be needed to recognize a language. We prove some theorems about saving amount of the nonconstructive help needed by encoding that information in automata. We also show that nonconstructive probabilistic automata...
Article
Full-text available
This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N-gram-based SMT systems. The major diffe...
Article
Full-text available
This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list resc...

Network

Cited By

Projects

Projects (2)
Project
Largest online explanatory dictionary for Latvian.
Project
Developing the first syntacticaly annotated treebank for Latvian.