
Normunds GrūzītisUniversity of Latvia | LU · Institute of Mathematics and Computer Science
Normunds Grūzītis
PhD
About
48
Publications
7,399
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
278
Citations
Citations since 2017
Introduction
Additional affiliations
January 2014 - June 2015
September 2005 - present
Publications
Publications (48)
Berkeley FrameNet is a lexico-semantic resource for English based on the theory of frame semantics. It has been exploited in a range of natural language processing applications and has inspired the development of framenets for many languages. We present a methodological approach to the extraction and generation of a computational multilingual Frame...
We present the creation of an English-Swedish FrameNet-based grammar in Grammatical Framework. The aim of this research is to make existing framenets computationally accessible for multilingual natural language applications via a common semantic grammar API, and to facilitate the porting of such grammar to other languages. In this paper, we describ...
The paper presents an ongoing research that aims at OWL ontology authoring and verbalization using a deterministic controlled natural language (CNL) that would be as natural and intuitive as possible. Moreover, we focus on a multilingual CNL interface to OWL by considering both highly analytical and highly synthetic languages (namely, English and L...
When an OWL ontology, together with SWRL rules, is defined or verbalized in controlled natural language (CNL) it is important to ensure that the meaning of CNL statements will be unambiguously (predictably) interpreted by both human and machine. CNLs that are based on analytical languages (namely, English) impose a number of syntactic restrictions...
Although phrase structure grammars have turned out to be a more popular approach for analysis and representation of the natural language syntactic structures, dependency grammars are often considered as being more appropriate for free word order languages. While building a parser for Latvian, a language with a rather free word order, we found (simi...
LNCC is a diverse collection of Latvian language corpora representing both written and spoken language and is useful for both linguistic research and language modelling. The collection is intended to cover diverse Latvian language use cases and all the important text types and genres (e.g. news, social media, blogs, books, scientific texts, debates...
In the medical domain various approaches are used to produce examination reports and other medical records. Depending on the language-specific technology support, the type of examination, the size of the hospital or clinic, and other aspects, the reporting workflow can range from completely manual to (semi-)automated. A manual workflow may complete...
This research focuses on the interrelation between news content on COVID-19 of three largest online news sites in Latvia (delfi.lv, apollo.lv, tvnet.lv) and the audience reaction to the news in the Latvian and Russian channels during the state of emergency. By using a tool for audience behaviour analysis, the Index of the Internet Aggressiveness (I...
We present a new ontology language Pini and the PiniTree ontology editor supporting it. Despite Pini language bearing lot of similarities with RDF, UML class diagrams, Property Graphs and their frontends like Google Knowledge Graph and Protégé, it is a more expressive language enabling FrameNet-style natural language annotation for Atomised journal...
This paper describes an ongoing work on the creation of Latvian language resources for the medical domain focusing on digital imaging to develop a medical speech recognition system for Latvian. The language resources include a pronunciation lexicon, a text corpus for language modelling, and an orthographically transcribed speech corpus for the (i)...
Abstract syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides gramm...
This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive
text summarization and knowledge base population, which are required by the project industri...
This paper presents a work in progress, creating a FrameNet-annotated text corpus for Latvian. This is a part of a larger project which aims at the creation of a multilayered corpus, anchored in cross-lingual state-of-the-art syntactic and semantic representations: Universal Dependencies (UD), FrameNet and PropBank, as well as Abstract Meaning Repr...
In the era of Big Data and Deep Learning, there is a common view that machine learning approaches are the only way to cope with the robust and scalable information extraction and summarization. It has been recently proposed that the CNL approach could be scaled up, building on the concept of embedded CNL and, thus, allowing for CNL-based informatio...
Ontologies are one of the core foundations of the Semantic Web. To participate in Semantic Web projects, domain experts need to be able to understand the ontologies involved. Visual notations can provide an overview of the ontology and help users to understand the connections among entities. However, the users first need to learn the visual notatio...
Normative texts are documents based on the deontic notions of obligation, permission, and prohibition. Our goal is model such texts using the C-O Diagram formalism, making them amenable to formal analysis, in particular verifying that a text satisfies properties concerning causality of actions and timing constraints. We present an experimental, sem...
We are concerned with the analysis of normative texts - documents based on the deontic notions of obligation, permission, and prohibition. Our goal is to make queries about these notions and verify that a text satisfies certain properties concerning causality of actions and timing constraints. This requires taking the original text and building a r...
We describe an extensive and versatile lexical resource for Latvian, an under-resourced Indo-European language, which we call Tezaurs (Latvian for 'thesaurus'). It comprises a large explanatory dictionary of more than 250,000 entries that are derived from more than 280 external sources. The dictionary is enriched with phonetic, morphological, seman...
Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Lang...
The goal of this paper is to propose a system that can extract formal semantic knowledge representation from natural language eGov policies. We present an architecture that allows for extracting Controlled Natural Language (CNL) statements from heterogeneous natural language texts with the ability to support multilinguality. The approach is based o...
This paper presents a semi-automatic approach to acquire a computational construction grammar from the semi-formal Swedish Constructicon. The implementation is based on the resource grammar library provided by Grammatical Framework and can be seen as an extension to the existing Swedish resource grammar. An important consequence of this work is tha...
The invention relates to data processing methods as well as methods of systematic multilingual lexicalization of object type properties in web ontology language (OWL) ontologies. The proposed computer-implemented method of defining the lexical form and the syntactic valence of OWL object type properties comprises instructions to be carried out in a...
This paper presents a currently bilingual but potentially multilingual
FrameNet-based grammar library implemented in Grammatical Framework. The
contribution of this paper is two-fold. First, it offers a methodological
approach to automatically generate the grammar based on semantico-syntactic
valence patterns extracted from FrameNet-annotated corpo...
The development of a verb valency lexicon for Latvian has been recently started. The chosen approach combines and supplements the experience of similar lexical resources developed for other languages. The paper describes our approach to the verb valency annotation—the valency layers (syntactic and semantic valency, selectional restrictions) and the...
In this paper we present an ongoing research investigating the possibility and potential of integrating frame semantics, particularly FrameNet, in the Grammatical Framework (GF) application grammar development. An important component of GF is its Resource Grammar Library (RGL) that encapsulates the low-level linguistic knowledge about morphology an...
In this paper we describe an ongoing work developing a system (a set of web-services) for transliterating the Gothic-based Fraktur script of historical Latvian to the Latin-based script of contemporary Latvian. Currently the system consists of two main components: a generic transliteration engine that can be customized with alternative sets of rule...
This paper describes an open-source Latvian resource grammar implemented in Grammatical Framework (GF), a programming language for multilingual grammar applications. GF differentiates between concrete grammars and abstract grammars: translation among concrete languages is provided via abstract syntax trees. Thus the same concrete grammar is effecti...
The paper presents an ongoing research that aims at OWL ontology authoring and verbalization using a deterministic controlled natural language (CNL) that would be as natural and intuitive as possible. Moreover, we focus on a multilingual CNL interface to OWL by considering both highly analytical and highly synthetic languages (namely, English and L...
The research subject of this doctoral thesis is the formal, automatic grammatical and semantic analysis of the highly inflective, synthetic Latvian language. A novel hybrid grammar model is proposed, which is especially suited for languages with relatively free word order. The model has been tested on a syntactically restricted subset of Latvian, c...
Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state...
The last six years have been very important for research and development of language technologies in Latvia. Several large projects have been funded by the government of Latvia, important tools and resources have been created by the industry, and since 2006 Latvia has participated in the CLARIN initiative. Although there is still a gap in language...
The dependency approach, originally developed by Lucien Tesnière, has become a popular model of syntactic representation. However, the state-of-the-art dependency parsers and annotation schemes typically discard some relevant features of the original Tesnière's model, retaining only the concept of dependency relations between individual words. The...
Computational semantics and logic-based controlled natural languages (CNL) do
not address systematically the word sense disambiguation problem of content
words, i.e., they tend to interpret only some functional words that are crucial
for construction of discourse representation structures. We show that
micro-ontologies and multi-word units allow in...
Representation of FrameNet as a 4D multidimensional ontology is proposed in the paper. This novel representation allows both to re-create FrameNet ontology from semantically annotated texts, as well as to use this representation for semantic annotation of new texts. Further extensions of this approach with 5th dimension for anaphora annotation is d...
Word sense disambiguation (WSD) along with methods for discourse representation of the parsed text, are among the most difficult tasks in computational linguistics today. Without providing a satisfactory solution to these problems, the true automated semantic processing of texts, as envisioned by semantic web, machine translation, or information re...
Re-engineering of successful pre-OWL ontologies or other formal ER or UML system models towards OWL DL compliance opens new possibilities in ontology debugging, enabled by the formal semantics and automated reasoners developed for OWL DL. Meanwhile the transformation of legacy ontologies to OWL DL is a challenging and interesting task, which we ill...