Przemysław KazienkoWrocław University of Science and Technology | WUT · Department of Computational Intelligence
Przemysław Kazienko
Professor
About
287
Publications
111,595
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,219
Citations
Introduction
Data science, machine learning, deep machine learning, emotion detection, wearables, social media analysis, NLP, recommender systems
Publications
Publications (287)
Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representati...
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency metho...
The rapid evolution of large language models, in particular OpenAI’s GPT-3.5-turbo and GPT-4, indicates a growing interest in advanced computational methodologies. This paper proposes a novel approach to synthetic data generation and knowledge distillation through prompt engineering. The potential of large language models (LLMs) is used to address...
We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through self-assessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in The Unknown (PiUs), along with o...
Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reaso...
Large language models are experiencing a significant surge of attention and rapid development. It is happening mainly due to the publication of OpenAI's ChatGPT models: GPT3.5-turbo and GPT-4. This article uses prompt engineering to present an innovative approach to synthetic data generation and knowledge distillation. Specifically, we focus on thr...
The performance of machine learning models is closely linked to the quality of training data, underpinning the ’garbage in, garbage out’ principle. Label noise in datasets is a key challenge in training and evaluation. This study introduces two innovative ChatGPT-based methods, ChatGPT-Predict and ChatGPT-Detect, for effective noise detection in la...
The development of large language models, such as ChatGPT (GPT-3.5) and GPT-4, has revolutionized natural language processing (NLP) and opened up new possibilities in various fields. These models demonstrate remarkable capabilities in generating coherent and contextually relevant text, making them suitable for a wide range of applications. This wor...
Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the...
Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language processing (NLP) problems like offensiveness or emotion detection is often very expensive and time-consuming. One of t...
The vast area of subjectivity in Natural Language Processing (NLP) poses a challenge to the solutions typically used in generalized tasks. As exploration in the scope of generalized NLP is much more advanced, it implies the tremendous gap that is still to be addressed amongst all feasible tasks where an opinion, taste, or feelings are inherent, thu...
This article compiles research on the extraction of human characteristics using three different methods: questionnaires, annotations , and biases. We have performed an analysis of how personalized perception of texts is affected by individual human profile and bias. To acquire comprehensive knowledge about individual user preferences , we have gath...
Data Maps is an interesting method of graphical representation of datasets, which allows observing the model’s behaviour for individual instances in the learning process (training dynamics). The method groups elements of a dataset into easy-to-learn, ambiguous, and hard-to-learn. In this article, we present an extension of this method, Differential...
Some tasks in content processing, e.g., natural language processing (NLP), like hate or offensive speech and emotional or funny text detection, are subjective by nature. Each human may perceive some content individually. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamentally diffe...
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transforme...
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
For subjective NLP problems, such as classification of hate speech, aggression, or emotions, personalized solutions can be exploited. Then, the learned models infer about the perception of the content independently for each reader. To acquire training data, texts are commonly randomly assigned to users for annotation, which is expensive and highly...
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transforme...
As the popularity of wearables increases, so does their utility for studying emotions. Using new technologies points to several ethical challenges to be considered to improve research designs. There are several ethical recommendations for utilizing wearables to study human emotions, but they focus on emotion recognition systems applications rather...
As humans, we experience a wide range of feelings and reactions. One of these is laughter, often related to a personal sense of humor and the perception of funny content. Due to its subjective nature, recognizing humor in NLP is a very challenging task. Here, we present a new approach to the task of predicting humor in the text by applying the idea...
Cardiac monitoring based on wearable photoplethysmography (PPG) is widespread because of its usability and low cost. Unfortunately, PPG is negatively affected by various types of disruptions, which could introduce errors to the algorithm that extracts pulse rate variability (PRV). This study aims to identify the nature of such artifacts caused by v...
We carried out extensive experiments on the MultiEmo dataset for sentiment analysis with texts in eleven languages. Two adapted versions of the LaBSE deep architecture were confronted against the LASER model. That allowed us to conduct cross-language validation of these language agnostic methods. The achieved results proved that LaBSE embeddings wi...
Cardiac monitoring based on wearable photoplethysmography (PPG) is widespread because of its usability and low cost. Unfortunately, PPG is negatively affected by various types of disruptions, which could introduce errors to the algorithm that extracts Pulse Rate Variability (PRV). This study aims to identify the nature of such artifacts caused by v...
A unified gold standard commonly exploited in natural language processing (NLP) tasks requires high inter-annotator agreement. However, there are many subjective problems that should respect users individual points of view. Therefore in this paper, we evaluate three different personalized methods on the task of hate speech detection. The user-cente...
We propose and test multiple neuro-symbolic methods for sentiment analysis. They combine deep neural networks – transformers and recurrent neural networks – with external knowledge bases. We show that for simple models, adding information from knowledge bases significantly improves the quality of sentiment prediction in most cases. For medium-sized...
This work develops the concept of the temporal network epistemology model enabling the simulation of the learning process in dynamic networks. The results of the research, conducted on the temporal social network generated using the CogSNet model and on the static topologies as a reference, indicate a significant influence of the network temporal d...
Smart wearables, equipped with sensors monitoring physiological parameters, are becoming an integral part of our life. In this work, we investigate the possibility of utilizing such wearables to recognize emotions in the wild. In most reviewed papers, the authors apply a similar procedure consisting of participant recruitment, stimuli preparation a...
Emotion recognition in real life is challenging since training machine learning models requires many annotated samples with experienced emotions. Although collecting such data is a difficult task, we may improve the process by utilizing a pre-trained model detecting emotional events. We conducted a study to test whether employing machine learning m...
The Emognition dataset is dedicated to testing methods for emotion recognition (ER) from physiological responses and facial expressions. We collected data from 43 participants who watched short film clips eliciting nine discrete emotions: amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, and sadness. Three wearables were used to r...
This work develops the concept of temporal network epistemology model enabling the simulation of the learning process in dynamic networks. The results of the research, conducted on the temporal social network generated using the CogSNet model and on the static topologies as a reference, indicate a significant influence of the network temporal dynam...
We propose WEIG – Wroclaw Effectiveness Indicator for Grants. This new scientometric measure is an aggregated quality measure of scientific papers published with the grant support divided by its budget. Several WEIG variations have been considered with respect to journal quality indicators like Impact Factor (IF), Article Influence Score (AIS), and...
We developed and validated a language-agnostic method for sentiment analysis. Cross-language experiments carried out on the new MultiEmo dataset with texts in 11 languages proved that LaBSE embeddings with an additional attention layer implemented in the BiLSTM architecture outperformed other methods in most cases.KeywordsCross-language NLPSentimen...
Some tasks in content processing, e.g., natural language processing (NLP) like hate or offensive speech, emotional or funny texts detection are subjective by nature. Each human may perceive some content in their own individual way. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamen...
Many tasks in natural language processing like offensive, toxic, or emotional text classification are subjective by nature. Humans tend to perceive textual content in their own individual way. Existing methods commonly rely on the agreed output values, the same for all consumers. Here, we propose personalized solutions to subjective tasks. Our four...
Smartphones have become an integral part of our lives. One of their crucial functionalities is sharing data. We analyze the communication modules in Android devices (WiFi, Bluetooth, NFC) in terms of parallel data streaming capabilities. We find that increasing the number of concurrent threads reduces the broadcast time, but also consumes a lot of...
Analysis of subjective texts like offensive content or hate speech is a great challenge, especially regarding annotation process. Most of current annotation procedures are aimed at achieving a high level of agreement in order to generate a high quality reference source. However, the annotation guidelines for subjective content may restrict the anno...
Analysis of emotions elicited by opinions, comments, or articles commonly exploits annotated corpora, in which the labels assigned to documents average the views of all annotators, or represent a majority decision. The models trained on such data are effective at identifying the general views of the population. However, their usefulness for predict...
There is content such as hate speech, offensive, toxic or aggressive documents, which are perceived differently by their consumers. They are commonly identified using classifiers solely based on textual content that generalize pre-agreed meanings of difficult problems. Such models provide the same results for each user, which leads to high misclass...
Scientific output, as measured in research published annually, has seen a consistent growth for decades now. As more manuscripts are submitted for publication each year, new publishing venues appear – often as increasingly specialised offshoots of existing journals and conferences. This situation presents scholars with a wealth of publishing venues...
Recently, a variety of model designs and methods have blossomed in the context of the sentiment analysis domain. However, there is still a lack of comprehensive studies of Aspect-based Sentiment Analysis. We want to fill this gap and propose a comparison with ablation analysis of Aspect Term Extraction using various text embeddings methods. We part...
Human relations are driven by social events—people interact, exchange information, share knowledge and emotions, and gather news from mass media. These events leave traces in human memory, the strength of which depends on cognitive factors such as emotions or attention span. Each trace continuously weakens over time unless another related event act...
To further extend the applicability of wearable sensors in various domains such as mobile health systems and the automotive industry, new methods for accurately extracting subtle physiological information from these wearable sensors are required. However, the extraction of valuable information from physiological signals is still challenging-smartph...
In our study, we examine the impact of citation network structures on the ability to discern valuable research topics in Computer Science literature. We use the bibliographic information available in the DBLP database to extract candidate phrases from scientific paper abstracts. Following that, we construct citation networks based on direct citatio...
Wearables equipped with pervasive sensors enable us to monitor physiological and behavioral signals. In this study, we revised 55 off-the-shelf devices in recognition and analysis of emotion, stress, meditation, sleep, and physical activity, especially in field studies. Their usability directly comes from the types of sensors they possess as well a...
A high percentage of information that propagates through a social network is sourced from different exogenous sources. E.g., individuals may form their opinions about products based on their own experience or reading a product review, and then share that with their social network. This sharing then diffuses through the network, evolving as a combin...
Wearables like smartwatches or wrist bandsequipped with pervasive sensors enable us to monitor our phys-iological signals. In this study, we address the question whetherthey can help us to recognize our emotions in our everydaylife for ubiquitous computing. Using the systematic literaturereview, we identified crucial research steps and discussed th...
The advancement of science, as outlined by Popper and Kuhn, is largely qualitative, but with bibliometric data, it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore, it is also important to allocate finite resources to research topics that have the growth potential to accelerate the process from scienti...
In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict the evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic, and multistage method for Group Evolution Prediction (GEP) in complex networks, that f...
Recently, a variety of model designs and methods have blossomed in the context of the sentiment analysis domain. However, there is still a lack of wide and comprehensive studies of aspect-based sentiment analysis (ABSA). We want to fill this gap and propose a comparison with ablation analysis of aspect term extraction using various text embedding m...
We propose a novel approach to generate aspect hierarchies that proved to be consistently correct compared with human-generated hierarchies. We present an unsupervised technique using Rhetorical Structure Theory and graph analysis. We evaluated our approach based on 100,000 reviews from Amazon and achieved an astonishing 80% coverage compared with...
We proposed a~new accurate aspect extraction method that makes use of both word and character-based embeddings. We have conducted experiments of various models of aspect extraction using LSTM and BiLSTM including CRF enhancement on five different pre-trained word embeddings extended with character embeddings. The results revealed that BiLSTM outper...
Prediction over edges and nodes in graphs requires appropriate and efficiently achieved data representation. Recent research on representation learning for dynamic networks resulted in a significant progress. However, the more precise and accurate methods, the greater computational and memory complexity. Here, we introduce ICMEN - the first-in-clas...
We claim that networks are created according to the priority attachment mechanism. We introduce a simple model, which uses the priority attachment to generate both synthetic and close to empirical networks. Priority attachment is a mechanism, which generalizes previously proposed mechanisms, such as small world creation or preferential attachment,...
We propose a novel approach to generate aspect hierarchies that proved to be consistently correct compared with human-generated hierarchies. We present an unsupervised technique using Rhetorical Structure Theory and graph analysis. We evaluated our approach based on 100,000 reviews from Amazon and achieved an astonishing 80% coverage compared with...
The advancement of science as outlined by Popper and Kuhn is largely qualitative, but with bibliometric data it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore it is also important to allocate finite resources to research topics that have growth potential, to accelerate the process from scientific bre...