Arkadiusz JanzWroclaw University of Science and Technology | WUT
Arkadiusz Janz
Doctor of Philosophy
About
29
Publications
10,403
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
571
Citations
Introduction
Publications
Publications (29)
Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reaso...
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish langu...
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Recent advances in Word Sense Disambiguation suggest neural language models can be successfully improved by incorporating knowledge base structure. Such class of models are called hybrid solutions. We propose a method of improving hybrid WSD models by harnessing data augmentation techniques and bilingual training. The data augmentation consist of s...
The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare lar...
In this work, we present an advanced semantic search engine dedicated to travel offers, allowing the user to create queries in the Natural Language. We started with the Polish language in focus. Search for e-commerce requires a different set of methods and algorithms than search for travel, search for corporate documents, for law documents, for med...
We propose and test multiple neuro-symbolic methods for sentiment analysis. They combine deep neural networks – transformers and recurrent neural networks – with external knowledge bases. We show that for simple models, adding information from knowledge bases significantly improves the quality of sentiment prediction in most cases. For medium-sized...
We introduce a comprehensive evaluation benchmark for Polish Word Sense Disambiguation task. The benchmark consists of 7 distinct datasets with sense annotations based on plWordNet–4.2. As far as we know, our work is a first attempt to standardise existing sense annotated data for Polish. We also follow the recent trends of neural WSD solutions and...
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additio...
Emotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings data...
Lexical resources are crucial in many modern applications of Natural Language Processing and Artificial Intelligence. We present VeSNet – a network of lexical resources resulting from the merge of Polish-English WordNet (PEWN) with several existing large electronic thesauri from the Linked Open Data cloud (DBpedia, Wikipedia, GeoWordNet, Agrovoc, E...
We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic regression were successfully used in this task, outperforming baselines. The feature set utilised lemma properties, gloss similarities, graph distances and polysemy patterns. The p...
In this paper we present a new individual measure for the task of evocation strength prediction. The proposed solution is based on Dijkstra’s distances calculated on the WordNet graph expanded with polysemy relations. The polysemy network was constructed using chaining procedure executed on individual word senses of polysemous lemmas. We show that...
Presentation for the article: Propagation of emotions, arousal and polarity in WordNet using Heterogeneous Structured Synset Embeddings
Relation Extraction is a fundamental NLP task. In this paper we investigate the impact of underlying text representation on the performance of neural classification models in the task of Brand-Product relation extraction. We also present the methodology of preparing annotated textual corpora for this task and we provide valuable insight into the pr...
Sentiment analysis is a hot research topic of Natural Language Processing with its main focus on emotive analysis of textual opinions. The task of sentiment recognition is highly domain-dependent, thus, there is a great need for designing the methods with decent domain adaptation abilities. In this paper we present a brief overview of existing data...
In this paper we present a novel method for emotive propagation in a wordnet based on a large emotive seed. We introduce a sense-level emotive lexicon annotated with polarity, arousal and emotions. The data were annotated as a part of a large study involving over 20,000 participants. A total of 30,000 lexical units in Polish WordNet were described...
According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks....
Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful app...
In this article, we present a novel multidomain dataset of Polish text reviews. The data were annotated as part of a large study involving over 20,000 participants. A total of 7,000 texts were described with metadata, each text received about 25 annotations concerning polarity, arousal and eight basic emotions, marked on a multilevel scale. We pres...
In this paper we present a novel approach to the construction of an extensive, sense-level sentiment lexicon built on the basis of a wordnet. The main aim of this work is to create a high-quality sentiment lexicon in a partially automated way. We propose a method called Classifier-based Polarity Propagation, which utilises a very rich set of wordne...
This paper presents a supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. Its core is a graph-based representation constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relations extracted from text. Similarity between sentences is calculated as similarity b...
In this paper we present a comprehensive overview of recent methods of the sentiment propagation in a wordnet. Next, we propose a fully automated method called Classifier-based Polarity Propagation , which utilises a very rich set of features , where most of them are based on wordnet relation types, multi-level bag-of-synsets and bag-of-polarities....
In this paper we present our attempts in the PolEval 2017 Sentiment Analysis Task. The task is not only one of the first challenges in sentiment analysis focused on Polish language, but also represents a novel approach to sentiment analysis, namely, predicting the sentiment not of a sentence, or a document, but of a word or a phrase within the cont...
We present a large emotive lexicon of Polish which has been constructed by manual expansion of the emotive annotation defined for plWordNet 3.0 emo (a very large wordnet of Polish). The annotation encompasses: sentiment polarity, basic emotions and fundamental human values. Annotation scheme and revised guidelines for the annotation process are dis...