
Jurgita Kapočiūtė-Dzikienė- Professor
- Professor at Vytautas Magnus University
Jurgita Kapočiūtė-Dzikienė
- Professor
- Professor at Vytautas Magnus University
About
52
Publications
12,387
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
424
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (52)
Featured Application
Leveraging fine-tuned large language models, this study enhances the prediction of efficient chemical synthesis procedures, saving time and resources in laboratory workflows. The findings highlight the potential of tailored AI approaches in transforming organic synthesis through intelligent, data-driven methods.
Abstract
In op...
In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to de...
This study aims to address challenges in media monitoring by enhancing closed-set topic classification in multilingual contexts (where both training and testing occur in several languages) and crosslingual contexts (where training is in English and testing spans all languages). To achieve this goal, we utilized a dataset from the European Media Mon...
This paper presents a novel approach to predicting esterification procedures in organic chemistry by employing generative large language models (LLMs) to interpret and translate SMILES molecular notation into detailed procedural texts of synthesis reactions. Esterification reaction is important in producing various industrial intermediates, fragran...
This article proposes a methodology that uses machine learning algorithms to extract actions from structured chemical synthesis procedures, thereby bridging the gap between chemistry and natural language processing. The proposed pipeline combines ML algorithms and scripts to extract relevant data from USPTO and EPO patents, which helps transform ex...
The analysis of emotions expressed in natural language text, also known as sentiment analysis, is a key application of natural language processing (NLP). It involves assigning a positive, negative (sometimes also neutral) value to opinions expressed in various contexts such as social media, news, blogs, etc. Despite its importance, sentiment analys...
Due to the fast pace of life and online communications, the prevalence of English and the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors (typos) when typing. Restoring diacritics and correcting spelling is important for proper language use and disambiguation of texts for both humans and downstream algorithms. Howe...
Spreading of automatically generated clickbaits, fake news, and fake reviews undermines the veracity of the internet as a credible source of information. We investigate the problem of recognizing automatically generated short texts by exploring different Deep Learning models. To improve the classification results, AQ1 we use text augmentation techn...
Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual sol...
In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures al...
Accurate price evaluation of real estate is beneficial for many parties involved in real estate business such as real estate companies, property owners, investors, banks, and financial institutes. Artificial Neural Networks (ANNs) have shown promising results in real estate price evaluation. However, the performance of ANNs greatly depends upon the...
Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African...
Accurate intent detection-based chatbots are usually trained on larger datasets that are not available for some languages. Seeking the most accurate models, three English benchmark datasets that were human-translated into four morphologically complex languages (i.e., Estonian, Latvian, Lithuanian, Russian) were used. Two types of word embeddings (f...
Deep Neural Networks have demonstrated the great efficiency in many NLP task for various languages. Unfortunately, some resource-scarce languages as, e.g., Tigrinya still receive too little attention, therefore many NLP applications as part-of-speech tagging are in their early stages. Consequently, the main objective of this research is to offer th...
In this paper, we tackle an intent detection problem for the Lithuanian language with the real supervised data. Our main focus is on the enhancement of the Natural Language Understanding (NLU) module, responsible for the comprehension of user’s questions. The NLU model is trained with a properly selected word vectorization type and Deep Neural Netw...
In this paper we present the dependency parsing experiments done on the Lithuanian treebank Alksnis v2.1. This treebank contains 84 different fine-grained syntactic labels (and can be generalized to 10 coarse-grained), is annotated with lemmas, parts-of-speech and other fine-grained morphological information (cases, numbers, tenses, etc.). During o...
Accurate generative chatbots are usually trained on large datasets of question–answer pairs. Despite such datasets not existing for some languages, it does not reduce the need for companies to have chatbot technology in their websites. However, companies usually own small domain-specific datasets (at least in the form of an FAQ) about their product...
Human beings can generalize from one action to similar ones. Robots cannot do this and progress concerning information transfer between robotic actions is slow. We have designed a system that performs action generalization for manipulation actions in different scenarios. It relies on an action representation for which we perform code-snippet replac...
We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial-NBM and Support Vector Machine-SVM) and deep learning (Long Short-Term Memory-LSTM and Convolutional Neural Network-CNN) approaches. The traditional machine learning techniques wer...
We describe experiments in sentiment analysis of the Lithuanian texts using the deep learning methods: Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). Methods used with pre-trained Lithuanian neural word embeddings are tested with different pre-processing techniques: emoticons restoration, stop words removal, diacritics restor...
The paper describes the results of experiments in applying the Random Projection (RP) method for authorship identification of online texts. We propose using RP for feature dimensionality reduction to low-dimensional feature subspace combined with probability density function (PDF) estimation for identification of the features of each author. In our...
Neural network-based word embeddings –outperforming traditional approaches in the various Natural Language Processing tasks – have gained a lot of interest recently. Despite it, the Lithuanian word embeddings have never been obtained and evaluated before. Here we have used the Lithuanian corpus of \(\sim \)234 thousand running words and produced se...
This paper presents author profiling research done on the Lithuanian texts using automatic machine learning methods. Our research is novel and challenging due to the following reasons: 1) a big number of author profiling dimensions, i.e., gender, age, education, marital status and personality type; 2) very short (avg. ~ 24 tokens) non-normative tex...
We will demonstrate several morphological analyzers of languages for which morphological analysis is very difficult, and/or that are under-resourced. It will cover at least French, German, Khmer, Lao, Lithuanian, Portuguese, Quechua, Spanish and Russian. These morphological analyzers all run on the collaborative platform lingwarium.org that support...
In this research we compare two approaches (in particular, character-based machine learning and language modeling) and according to their results offer the best solution for the diacritization problem solving. Parameters of tested approaches (i.e., a huge variety of feature types for the character-based method and a value n for the n-gram language...
The rhythmicity characteristics of the written text is still an under-researched topic as opposed to the similar research in the speech analysis domain. The paper presents a method for text deconstruction into text modes using Empirical Mode Decomposition (EMD). First, the text is encoded into a numerical sequence using a mapping table. Next, the r...
This paper reports comparative authorship attribution results obtained on the Internet comments of the morphologically complex Lithuanian language. We have explored the impact of machine learning and similarity-based approaches on the different author set sizes (containing 10, 100, and 1,000 candidate authors), feature types (lexical, morphological...
Internet can be misused by cyber criminals as a plat-form to conduct illegitimate activities (such as harassment, cyberbullying, and incitement of hate or violence) anonymously. As a result, authorship analysis of anonymous texts in Internet (such as emails, forum comments) has attracted significant attention in the digital forensic and text mining...
This paper describes the baseline dictionary-based Lithuanian lemmatizer designed for an open online collaborative Machine Translation system. We evaluated our tool on the gold standard corpus composed of four different domains (official documents, fiction texts, scientific texts, and periodicals) containing ~1 million running words in total and ob...
In this paper we present the comparative research work disclosing strengths and weaknesses of two the most popular and publicly available Lithuanian morphological analyzers, in particular, Lemuoklis and Semantika.lt. Their lemmatization, part-of-speech tagging, and fined-grained annotation of the morphological categories (as case, gender, tense, et...
This paper describes the gender detection research done on Lithuanian texts using automatic machine learning methods. The main contribution of our work is investigations done namely on the very short (avg. ~ 39 tokens) non-normative texts. With this paper we analyze a fundamental problem: how to choose automatic methods (in particular, classifiers...
We address the problem of robots executing instructions written for humans. The goal is to simplify and speed-up the process of robot adaptation to certain tasks, which are described in human language. We propose an approach, where semantic roles are attached to the components of instructions which lead to robotic execution. However, extraction of...
Straipsnyje pristatome Seimo posėdžių stenogramų tekstyną, parengtą specialiu formatu, tinkančiu įvairiems autorystės nustatymo tyrimams. Tekstyną sudaro apie 111 tūkstančių tekstų (24 milijonai žodžių), kurių kiekvienas atitinka vieną parlamentaro pasisakymą eilinės sesijos posėdžio metu bei apima 7 Lietuvos Respublikos Seimo kadencijas: nuo 1990...
Instructions written in human-language cause no perception problems for humans, but become a challenge when translating them into robot executable format. This complex translation process covers different phases, including instruction completion by adding obligatory information that is not explicitly given in human-oriented instructions. Robot acti...
In this paper we report the first authorship attribution results for the Lithuanian language using Internet comments with a thousand of candidate authors. The task is complicated due to the following reasons: large number of candidate authors, extremely short non-normative texts, and problems associated with morphologically and vocabulary rich lang...
A number of recent research works have used supervised machine learning approaches with a bag-of-words to classify political texts –in particular, speeches and debates– by their ideological position, expressed with a party membership. However, our classification task is more complex due to the several reasons. First, we deal with the Lithuanian lan...
This paper reports the first authorship attribution results based on the automatic computational methods for the Lithuanian language. Using supervised machine learning techniques we experimentally investigated the influence of different feature types (lexical, character, and syntactic) focusing on a few authors within three datasets, containing tra...
This paper presents a hybrid Maximum Power Point Tracking (MPPT) method for improving the power-conversion efficiency of Photovoltaic (PV) generators. By detecting the output power changes caused by environmental reasons, the proposed method performs variable-step online search process with an accurate estimation of the Maximum Power Point (MPP) lo...
We present the first statistical dependency parsing results for Lithuanian, a morphologically rich language in the Baltic branch of the Indo-European family. Using a greedy transition-based parser, we obtain a labeled attachment score of 74.7 with gold morphology and 68.1 with predicted morphology (77.8 and 72.8 unlabeled). We investigate the usefu...
Despite many years of research and many effective methods proposed to solve topic classification tasks for such widely used languages as English, there is no clear answer whether these methods are suitable for languages that are substantially different. We attempt to solve a topic classification task for Lithuanian, a relatively resource-scarce lan...
Reinforcement-based agents have difficulties in transferring their acquired knowledge into new different environments due to the common identities-based percept representation and the lack of appropriate generalization capabilities. In this paper, the problem of knowledge transferability is addressed by proposing an agent dotted with decision tree...
The amount of information that is created, used or stored is growing exponentially and types of data sources are diverse. Most of it is available as an unstructured text. Moreover, considerable part of it is available on-line, usually accessible as Internet resources. It is too expensive or even impossible for humans to analyze all the resources fo...
This paper investigates the ability of an agent to recognize unseen goal states in an observable grid-world environment. This ability is important in order to allow the explicit planning of agent's action sequences. Grid-world states are described by the collection of attributes instead of "atomic" labels. This attribute-based state representation...
We present a new algorithm that follows "divide and conquer" machine learning approach and exhibits a few interesting cognitive properties. The algorithm aims at building the decision tree with only one terminal node per class. Splits of tree nodes are constrained to functions that take identical values (true or false) for every instance within the...
In this paper we present an algorithm that automatically recognizes and annotates person and place names, contractions, acronyms, foreign language phrases, dates and sentence boundaries in Lithuanian texts. The algorithm is based on a set of manually developed template matching rules and a few specialized lexicons. The algorithm performs annotation...