Andrius Utka

Andrius Utka
Vytautas Magnus University ·  Department of Lithuanian Language

PhD

About

37
Publications
3,162
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
101
Citations
Citations since 2017
16 Research Items
74 Citations
2017201820192020202120222023051015
2017201820192020202120222023051015
2017201820192020202120222023051015
2017201820192020202120222023051015
Introduction
Andrius Utka currently works at the Department of Lithuanian Language and Centre of Computational Linguistis at Vytautas Magnus University. Andrius does research in Discourse Analysis, Authorship Attribution and Computational Linguistics.
Additional affiliations
September 2010 - present
Vytautas Magnus University
Position
  • Head of centre

Publications

Publications (37)
Chapter
Full-text available
CLARIN-LT consortium is one of the leading Lithuanian language research and digital data storage infrastructures. This chapter will present outreach and initiatives performed by or in cooperation with the CLARIN-LT consortium and highlight their most significant outcomes. We will first highlight some of the resources stored in the CLARIN-LT reposit...
Conference Paper
Full-text available
The paper aims at presenting English-Lithuanian corpora for bilingual term extraction (BiTE) inthe cybersecurity domain within the framework of the project DVITAS. It is argued that a system of parallel, comparable, and training corpora for BiTE is particularly useful for less-resourced languages, as it allows efficiently to combine strengths and a...
Article
Full-text available
This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST A...
Article
Full-text available
The paper provides results of the frequential distribution analysis of cybersecurity terms used in the Lithuanian cybersecurity corpus composed of texts of different genres. The research focuses on the following aspects: overall distribution of cybersecurity terms (their density and diversity) across genres, distribution of English and English-Lith...
Article
The aim of the paper is the estimate of the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing (by Microsoft Corporation), and Yandex (by Yandex, Russia). For this purpose, a special list of 100 rare Lithuanian words (pivot words) with specific characteristics was compiled...
Article
Full-text available
The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termb...
Article
Full-text available
The paper presents the results of research on deep learning methods aiming to determine the most effective one for automatic extraction of Lithuanian terms from a specialized domain (cybersecurity) with very restricted resources. A semi-supervised approach to deep learning was chosen for the research as Lithuanian is a less resourced language and l...
Chapter
The paper presents an overview of the development and research in Lithuanian language technologies for the period 2016–2020. The most significant national and international LT related initiatives, projects, research infrastructures, language resources and tools are discussed. The paper also surveys research production in the field of language techn...
Book
The language technology bibliography for Lithuanian language in the period 2016-2020. The resource is in BibTex format and it contains: 1) 91 references of research publications, 2) 15 references of documents and strategies, and 3) 26 references of language resources and tools. The resource is used for the paper: Utka, Andrius, Jurgita Vaičenonienė...
Conference Paper
The paper presents an overview of recent advances of language technologies in Lithuania. It is shown that the development of Lithuanian language resources and technologies can be divided into three stages: the first (2004-2012), the second (2012-2015), and the third (2016-2020). The paper focuses on the second stage of development, which is labelle...
Conference Paper
Full-text available
In this paper, we present a preliminary research on the normalisation of Lithuanian social media texts. Specifically, the paper deals with language normalisation issues in Lithuanian user-generated comments in the three popular websites: Lietuvos Rytas (Lithuanian Morning), Verslo žinios (Business News), and Delfi.lt. We have established the propor...
Article
Straipsnyje pristatome Seimo posėdžių stenogramų tekstyną, parengtą specialiu formatu, tinkančiu įvairiems autorystės nustatymo tyrimams. Tekstyną sudaro apie 111 tūkstančių tekstų (24 milijonai žodžių), kurių kiekvienas atitinka vieną parlamentaro pasisakymą eilinės sesijos posėdžio metu bei apima 7 Lietuvos Respublikos Seimo kadencijas: nuo 1990...
Article
Full-text available
The Lithuanian language is a typical flectional language that has a very sophisticated system of grammatical forms and many means of derivation; it is also characterized by uncertain boundaries between morphemes. All this makes the morphemic analysis of the Lithuanian language very complex. The aim of this research is to define and describe morphem...
Conference Paper
This paper reports the first authorship at-tribution results based on the effect of the author set size using automatic computational methods for the Lithuanian language. The aim is to determine how fast authorship attribution results are deteriorating while the number of candidate authors is gradually increasing: i.e. starting from 3, going up to...
Conference Paper
In this paper we report the first authorship attribution results for the Lithuanian language using Internet comments with a thousand of candidate authors. The task is complicated due to the following reasons: large number of candidate authors, extremely short non-normative texts, and problems associated with morphologically and vocabulary rich lang...
Conference Paper
This paper reports the first authorship attribution results based on the automatic computational methods for the Lithuanian language. Using supervised machine learning techniques we experimentally investigated the influence of different feature types (lexical, character, and syntactic) focusing on a few authors within three datasets, containing tra...
Conference Paper
Full-text available
This paper explores the problem of ex-tracting domain specific terminology in the field of science and education from Lithuanian texts. Four different term ex-traction approaches have been applied and evaluated.
Conference Paper
The paper presents the development process of the 160m word Corpus of Contemporary Lithuanian Language (CCLL), standardization issues being the focus of current development phase. The paper presents problems and solutions for the process of converting the CCLL from a proprietary format into a standardised one. Challenges in encoding the corpus usin...
Article
Full-text available
As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morpho...
Article
Full-text available
The absolute majority of scholarly work in descriptive translation studies is product-oriented. In this article, the focus is moved from product-oriented to process-oriented translation studies by compiling an English - Lithuanian Phases of Translation Corpus (PT corpus). The PT corpus is analysed using quantitative and qualitative analyses. The qu...
Thesis
Full-text available
The main aim of this research is to identify text functions based on the distribution of Lithuanian common words and word-forms; and to test the possibility of categorizing texts according to their functional properties. The novelty of this work lies in the fact that text functions have been identified using frequency distributions of common words...

Network

Cited By