Baiba Saulīte

Baiba Saulīte
Verified
Baiba verified their affiliation via an institutional email.
Verified
Baiba verified their affiliation via an institutional email.
  • Dr. philol.
  • Senior Researcher at University of Latvia

About

33
Publications
7,578
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
212
Citations
Introduction
Skills and Expertise
Current institution
University of Latvia
Current position
  • Senior Researcher

Publications

Publications (33)
Article
Full-text available
Rakstā latviešu valodas salīdzinājuma konstrukcijas apskatītas atbilstoši tipoloģiskajā valodniecībā izmantotajai konstrukciju klasifikācijai gradācijas salīdzinājuma, pielīdzinājuma un vienlīdzības konstrukcijās. Katrā konstrukciju veidā parādīti latviešu valodā izmantotie valodas līdzekļi, noteiktas konstrukcijas sastāvdaļas un tām raksturīgie ie...
Book
Full-text available
Jānis Endzelīns (1873–1961), an internationally recognized Latvian linguist, is one of the greatest Baltists of all time. UNESCO has included Endzelīns’s 150th birthday, February 22, 2023, in its calendar of celebratory days. In honor of the special event, the exhibition “Linguist Jānis Endzelīns – 150” was prepared in the Academic Library of the...
Conference Paper
Full-text available
Open speech corpora of substantial size are seldom available for less-spoken languages, and this was recently the case also for Latvian with its 1.5M native speakers. While there exist several closed Latvian speech corpora of 100+ hours, used to train competitive models for automatic speech recognition (ASR), there were only a few tiny open dataset...
Article
Full-text available
Latvijas Universitātes Matemātikas un informātikas institūtā tiek veidots „Latviešu valodas sintaktiski marķētais korpuss” (LVTB), kurā tiek marķētas gan latviešu valodas sintakses teorijā jau aprakstītās latviešu valodas sintaktiskās parādības, gan arī retākas, līdz šim gramatikās sīkāk neanalizētas konstrukcijas. Šajā rakstā aplūkota vārdkopas an...
Conference Paper
Full-text available
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
Conference Paper
We propose an approach for generating an accurate and consistent PropBank-annotated corpus, given a FrameNet-annotated corpus which has an underlying dependency annotation layer, namely, a parallel Universal Dependencies (UD) treebank. The PropBank annotation layer of such a multi-layer corpus can be semi-automatically derived from the existing Fra...
Conference Paper
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural langu...
Conference Paper
Full-text available
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive text summarization and knowledge base population, which are required by the project industri...
Conference Paper
Full-text available
This paper presents a work in progress, creating a FrameNet-annotated text corpus for Latvian. This is a part of a larger project which aims at the creation of a multilayered corpus, anchored in cross-lingual state-of-the-art syntactic and semantic representations: Universal Dependencies (UD), FrameNet and PropBank, as well as Abstract Meaning Repr...
Presentation
Analysis on first conjugation verbs in the online dictionary Tēzaurs.lv (in Latvian).
Conference Paper
In this paper we present the first Universal Dependency Treebank for Latvian. Latvian UD Treebank contains approx. 1 thousand sentences. It has been created from Latvian Treebank newswire texts with the help of an automatic conversion. This resource is an important prerequisite for integrating Latvian in various international language processing fr...
Article
Full-text available
Latvian is a highly inflective language with rather free word order. In general, the unmarked (i. e., the most common) order of elements in a sentence is SVO, however, OVS, SOV, OSV are possible and grammatically correct. Data from the Latvian Valency Lexicon was used to analyse the word order models in Latvian. The paper, first of all, provides an...
Conference Paper
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Conference Paper
Full-text available
The development of a verb valency lexicon for Latvian has been recently started. The chosen approach combines and supplements the experience of similar lexical resources developed for other languages. The paper describes our approach to the verb valency annotation—the valency layers (syntactic and semantic valency, selectional restrictions) and the...
Article
Full-text available
Anotacija CONTENT WORDS IN THE FORMAL ANALYSIS OF LATVIAN Summary To characterize morphological features of each word in a text, a set of morphological features for Latvian has been defined. It describes grammatical categories and their pos­sible values characteristic of a particular part of speech or a smaller group of words. To define this set i...
Conference Paper
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
Conference Paper
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
Conference Paper
Full-text available
Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state...
Conference Paper
Full-text available
The dependency approach, originally developed by Lucien Tesnière, has become a popular model of syntactic representation. However, the state-of-the-art dependency parsers and annotation schemes typically discard some relevant features of the original Tesnière's model, retaining only the concept of dependency relations between individual words. The...
Conference Paper
Full-text available
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Conference Paper
Full-text available
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...
Conference Paper
Full-text available
Although phrase structure grammars have turned out to be a more popular approach for analysis and representation of the natural language syntactic structures, dependency grammars are often considered as being more appropriate for free word order languages. While building a parser for Latvian, a language with a rather free word order, we found (simi...

Network

Cited By