Baiba Saulīte

Baiba Saulīte
University of Latvia | LU · Institute of Mathematics and Computer Science

About

27
Publications
5,733
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
152
Citations
Citations since 2017
11 Research Items
114 Citations
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
201720182019202020212022202301020304050
Introduction
Skills and Expertise

Publications

Publications (27)
Conference Paper
Full-text available
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
Conference Paper
We propose an approach for generating an accurate and consistent PropBank-annotated corpus, given a FrameNet-annotated corpus which has an underlying dependency annotation layer, namely, a parallel Universal Dependencies (UD) treebank. The PropBank annotation layer of such a multi-layer corpus can be semi-automatically derived from the existing Fra...
Conference Paper
Full-text available
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural langu...
Conference Paper
Full-text available
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive text summarization and knowledge base population, which are required by the project industri...
Conference Paper
Full-text available
This paper presents a work in progress, creating a FrameNet-annotated text corpus for Latvian. This is a part of a larger project which aims at the creation of a multilayered corpus, anchored in cross-lingual state-of-the-art syntactic and semantic representations: Universal Dependencies (UD), FrameNet and PropBank, as well as Abstract Meaning Repr...
Presentation
Full-text available
Analysis on first conjugation verbs in the online dictionary Tēzaurs.lv (in Latvian).
Conference Paper
Full-text available
In this paper we present the first Universal Dependency Treebank for Latvian. Latvian UD Treebank contains approx. 1 thousand sentences. It has been created from Latvian Treebank newswire texts with the help of an automatic conversion. This resource is an important prerequisite for integrating Latvian in various international language processing fr...
Article
Full-text available
Latvian is a highly inflective language with rather free word order. In general, the unmarked (i. e., the most common) order of elements in a sentence is SVO, however, OVS, SOV, OSV are possible and grammatically correct. Data from the Latvian Valency Lexicon was used to analyse the word order models in Latvian. The paper, first of all, provides an...
Conference Paper
Full-text available
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Conference Paper
Full-text available
The development of a verb valency lexicon for Latvian has been recently started. The chosen approach combines and supplements the experience of similar lexical resources developed for other languages. The paper describes our approach to the verb valency annotation—the valency layers (syntactic and semantic valency, selectional restrictions) and the...
Article
Full-text available
Anotacija CONTENT WORDS IN THE FORMAL ANALYSIS OF LATVIAN Summary To characterize morphological features of each word in a text, a set of morphological features for Latvian has been defined. It describes grammatical categories and their pos­sible values characteristic of a particular part of speech or a smaller group of words. To define this set i...
Conference Paper
Full-text available
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
Conference Paper
Full-text available
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
Conference Paper
Full-text available
Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state...
Conference Paper
Full-text available
The dependency approach, originally developed by Lucien Tesnière, has become a popular model of syntactic representation. However, the state-of-the-art dependency parsers and annotation schemes typically discard some relevant features of the original Tesnière's model, retaining only the concept of dependency relations between individual words. The...
Conference Paper
Full-text available
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Conference Paper
Full-text available
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...
Conference Paper
Full-text available
Although phrase structure grammars have turned out to be a more popular approach for analysis and representation of the natural language syntactic structures, dependency grammars are often considered as being more appropriate for free word order languages. While building a parser for Latvian, a language with a rather free word order, we found (simi...

Network

Cited By

Projects

Projects (2)
Project
Largest online explanatory dictionary for Latvian.
Project
Developing the first syntacticaly annotated treebank for Latvian.