Kamil Kanclerz

Kamil Kanclerz
Wroclaw University of Science and Technology | WUT · Department of Artificial Intelligence

Master of Science

About

21
Publications
8,561
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
770
Citations
Additional affiliations
January 2019 - present
Wroclaw University of Science and Technology
Position
  • Researcher
Description
  • Member of G4.19 Scientific Group (part of the european research network CLARIN) Faculty of Information and Communication Technology Department of Computational Intelligence Fields of interests: * Natural Language Processing / Engineering * Computer Vision * Information Extraction (recognition of Named Entities, Temporal Expressions, Events, Keywords, etc.) * Sentiment Analysis * Artificial Intelligence / Machine Learning * GPU-based Deep Learning * Data Science * Parallel Computing
Education
October 2020 - July 2024
Wroclaw University of Science and Technology
Field of study
  • Computer Science
January 2019 - July 2020
Wroclaw University of Science and Technology
Field of study
  • Computer Science
October 2015 - January 2019
Wroclaw University of Science and Technology
Field of study
  • Computer Science

Publications

Publications (21)
Conference Paper
Full-text available
Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language processing (NLP) problems like offensiveness or emotion detection is often very expensive and time-consuming. One of t...
Conference Paper
Full-text available
This article compiles research on the extraction of human characteristics using three different methods: questionnaires, annotations , and biases. We have performed an analysis of how personalized perception of texts is affected by individual human profile and bias. To acquire comprehensive knowledge about individual user preferences , we have gath...
Chapter
Full-text available
Data Maps is an interesting method of graphical representation of datasets, which allows observing the model’s behaviour for individual instances in the learning process (training dynamics). The method groups elements of a dataset into easy-to-learn, ambiguous, and hard-to-learn. In this article, we present an extension of this method, Differential...
Article
Full-text available
Some tasks in content processing, e.g., natural language processing (NLP), like hate or offensive speech and emotional or funny text detection, are subjective by nature. Each human may perceive some content individually. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamentally diffe...
Article
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Preprint
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Conference Paper
Full-text available
In recognizing hate speech in text, a frequently overlooked aspect is the specific recipient of the content. Information about the user can be considered as another potential modality in addition to the textual representation. In this work, we present the multi-modal hate speech detection problem as a task of personalized prediction based on text a...
Conference Paper
Full-text available
For subjective NLP problems, such as classification of hate speech, aggression, or emotions, personalized solutions can be exploited. Then, the learned models infer about the perception of the content independently for each reader. To acquire training data, texts are commonly randomly assigned to users for annotation, which is expensive and highly...
Conference Paper
Full-text available
As humans, we experience a wide range of feelings and reactions. One of these is laughter, often related to a personal sense of humor and the perception of funny content. Due to its subjective nature, recognizing humor in NLP is a very challenging task. Here, we present a new approach to the task of predicting humor in the text by applying the idea...
Chapter
Multiword Expression (MWE) detection is a crucial problem for many NLP applications. Recent methods approach it as a sequence labeling task and require manually annotated corpus. Traditional methods are based on statistical association measures and express limited accuracy, especially on smaller corpora. In this paper, we propose a novel weakly sup...
Conference Paper
Full-text available
A unified gold standard commonly exploited in natural language processing (NLP) tasks requires high inter-annotator agreement. However, there are many subjective problems that should respect users individual points of view. Therefore in this paper, we evaluate three different personalized methods on the task of hate speech detection. The user-cente...
Preprint
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is v...
Conference Paper
Full-text available
Effective methods for multiword expressions detection are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme applied to an annotated corpus, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two appr...
Chapter
Effective methods of the detection of multiword expressions are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. In this paper, we...
Article
Some tasks in content processing, e.g., natural language processing (NLP) like hate or offensive speech, emotional or funny texts detection are subjective by nature. Each human may perceive some content in their own individual way. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamen...
Conference Paper
Full-text available
Many tasks in natural language processing like offensive, toxic, or emotional text classification are subjective by nature. Humans tend to perceive textual content in their own individual way. Existing methods commonly rely on the agreed output values, the same for all consumers. Here, we propose personalized solutions to subjective tasks. Our four...
Conference Paper
Full-text available
Analysis of emotions elicited by opinions, comments, or articles commonly exploits annotated corpora, in which the labels assigned to documents average the views of all annotators, or represent a majority decision. The models trained on such data are effective at identifying the general views of the population. However, their usefulness for predict...
Conference Paper
Full-text available
There is content such as hate speech, offensive, toxic or aggressive documents, which are perceived differently by their consumers. They are commonly identified using classifiers solely based on textual content that generalize pre-agreed meanings of difficult problems. Such models provide the same results for each user, which leads to high misclass...
Chapter
Full-text available
This article presents MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annota...
Article
Full-text available
In this article, we present a novel technique for the use of language-agnostic sentence representations to adapt the model trained on texts in Polish (as a low-resource language) to recognize polarity in texts in other (high-resource) languages. The first model focuses on the creation of a language-agnostic representation of each sentence. The seco...

Network

Cited By