
Baiba Saulīte- Dr. philol.
- Senior Researcher at University of Latvia
Baiba Saulīte
- Dr. philol.
- Senior Researcher at University of Latvia
About
33
Publications
7,578
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
212
Citations
Current institution
Publications
Publications (33)
Rakstā latviešu valodas salīdzinājuma konstrukcijas apskatītas atbilstoši tipoloģiskajā valodniecībā izmantotajai konstrukciju klasifikācijai gradācijas salīdzinājuma, pielīdzinājuma un vienlīdzības konstrukcijās. Katrā konstrukciju veidā parādīti latviešu valodā izmantotie valodas līdzekļi, noteiktas konstrukcijas sastāvdaļas un tām raksturīgie ie...
Jānis Endzelīns (1873–1961), an internationally recognized Latvian
linguist, is one of the greatest Baltists of all time.
UNESCO has included Endzelīns’s 150th birthday, February 22, 2023, in its
calendar of celebratory days. In honor of the special event, the exhibition
“Linguist Jānis Endzelīns – 150” was prepared in the Academic Library
of the...
Open speech corpora of substantial size are seldom available for less-spoken languages, and this was recently the case also for Latvian with its 1.5M native speakers. While there exist several closed Latvian speech corpora of 100+ hours, used to train competitive models for automatic speech recognition (ASR), there were only a few tiny open dataset...
Latvijas Universitātes Matemātikas un informātikas institūtā tiek veidots „Latviešu valodas sintaktiski marķētais korpuss” (LVTB), kurā tiek marķētas gan latviešu valodas sintakses teorijā jau aprakstītās latviešu valodas sintaktiskās parādības, gan arī retākas, līdz šim gramatikās sīkāk neanalizētas konstrukcijas. Šajā rakstā aplūkota vārdkopas an...
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
We propose an approach for generating an accurate and consistent PropBank-annotated corpus, given a FrameNet-annotated corpus which has an underlying dependency annotation layer, namely, a parallel Universal Dependencies (UD) treebank. The PropBank annotation layer of such a multi-layer corpus can be semi-automatically derived from the existing Fra...
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural langu...
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive
text summarization and knowledge base population, which are required by the project industri...
This paper presents a work in progress, creating a FrameNet-annotated text corpus for Latvian. This is a part of a larger project which aims at the creation of a multilayered corpus, anchored in cross-lingual state-of-the-art syntactic and semantic representations: Universal Dependencies (UD), FrameNet and PropBank, as well as Abstract Meaning Repr...
Analysis on first conjugation verbs in the online dictionary Tēzaurs.lv (in Latvian).
In this paper we present the first Universal Dependency Treebank for Latvian. Latvian UD Treebank contains approx. 1 thousand sentences. It has been created from Latvian Treebank newswire texts with the help of an automatic conversion. This resource is an important prerequisite for integrating Latvian in various international language processing fr...
Latvian is a highly inflective language with rather free word order. In general, the unmarked (i. e., the most common) order of elements in a sentence is SVO, however, OVS, SOV, OSV are possible and grammatically correct.
Data from the Latvian Valency Lexicon was used to analyse the word order models in Latvian. The paper, first of all, provides an...
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
The development of a verb valency lexicon for Latvian has been recently started. The chosen approach combines and supplements the experience of similar lexical resources developed for other languages. The paper describes our approach to the verb valency annotation—the valency layers (syntactic and semantic valency, selectional restrictions) and the...
Anotacija
CONTENT WORDS IN THE FORMAL ANALYSIS OF LATVIAN
Summary
To characterize morphological features of each word in a text, a set of morphological features for Latvian has been defined. It describes grammatical categories and their possible values characteristic of a particular part of speech or a smaller group of words.
To define this set i...
In this paper we demonstrate a hybrid treebank encoding format, derived from the dependency-based format used in Prague Dependency Treebank (PDT). We have specified a Prague Markup Lan-guage (PML) profile for the SemTi-Kamols hybrid grammar model that has been developed for languages with rela-tively free word order (e.g. Latvian). This has allowed...
In this paper we describe preparatory work for constructing a Treebank for Latvian as no such resource currently exists. Previously elaborated SemTi-Kamols hybrid dependency based grammar model has been extended to make it appropriate for broad coverage text annotation. We also have integrated extended SemTi-Kamols model with graphical tree editor...
Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state...
The dependency approach, originally developed by Lucien Tesnière, has become a popular model of syntactic representation. However, the state-of-the-art dependency parsers and annotation schemes typically discard some relevant features of the original Tesnière's model, retaining only the concept of dependency relations between individual words. The...
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...
Although phrase structure grammars have turned out to be a more popular approach for analysis and representation of the natural language syntactic structures, dependency grammars are often considered as being more appropriate for free word order languages. While building a parser for Latvian, a language with a rather free word order, we found (simi...