Piotr Miłkowski

Piotr Miłkowski
Wrocław University of Science and Technology | WUT

Master of Engineering

About

26
Publications
10,724
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
815
Citations

Publications

Publications (26)
Conference Paper
Full-text available
Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the...
Article
Full-text available
Some tasks in content processing, e.g., natural language processing (NLP), like hate or offensive speech and emotional or funny text detection, are subjective by nature. Each human may perceive some content individually. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamentally diffe...
Article
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Article
Full-text available
In this paper, we propose a sentiment analysis of Twitter data focused on the attitudes and sentiments of Polish migrants and stayers during the pandemic. We collected 9 million tweets and retweets between January and August 2021, and analysed them using MultiEmo, the multilingual, multilevel, multi-domain sentiment analysis corpus. We discovered t...
Preprint
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Article
Full-text available
In this paper, we propose a sentiment analysis of Twitter data focused on the attitudes and sentiments of Polish migrants and stayers during the pandemic. We collected 9 million tweets and retweets between January and August 2021, and analysed them using MultiEmo, the multilingual, multilevel, multi-domain sentiment analysis corpus. We discovered t...
Conference Paper
Full-text available
For subjective NLP problems, such as classification of hate speech, aggression, or emotions, personalized solutions can be exploited. Then, the learned models infer about the perception of the content independently for each reader. To acquire training data, texts are commonly randomly assigned to users for annotation, which is expensive and highly...
Conference Paper
Full-text available
As humans, we experience a wide range of feelings and reactions. One of these is laughter, often related to a personal sense of humor and the perception of funny content. Due to its subjective nature, recognizing humor in NLP is a very challenging task. Here, we present a new approach to the task of predicting humor in the text by applying the idea...
Chapter
Full-text available
In this work, we present an advanced semantic search engine dedicated to travel offers, allowing the user to create queries in the Natural Language. We started with the Polish language in focus. Search for e-commerce requires a different set of methods and algorithms than search for travel, search for corporate documents, for law documents, for med...
Chapter
Full-text available
We carried out extensive experiments on the MultiEmo dataset for sentiment analysis with texts in eleven languages. Two adapted versions of the LaBSE deep architecture were confronted against the LASER model. That allowed us to conduct cross-language validation of these language agnostic methods. The achieved results proved that LaBSE embeddings wi...
Chapter
Full-text available
In this paper, we present paragraph segmentation using cross-lingual knowledge transfer models. In our solution, we investigate the quality of multilingual models, such as mBERT and XLM-RoBERTa, as well as language independent models, LASER and LaBSE. We study the quality of segmentation in 9 different European languages, both for each language sep...
Conference Paper
Full-text available
A unified gold standard commonly exploited in natural language processing (NLP) tasks requires high inter-annotator agreement. However, there are many subjective problems that should respect users individual points of view. Therefore in this paper, we evaluate three different personalized methods on the task of hate speech detection. The user-cente...
Chapter
Full-text available
We propose and test multiple neuro-symbolic methods for sentiment analysis. They combine deep neural networks – transformers and recurrent neural networks – with external knowledge bases. We show that for simple models, adding information from knowledge bases significantly improves the quality of sentiment prediction in most cases. For medium-sized...
Chapter
Full-text available
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additio...
Chapter
Full-text available
We developed and validated a language-agnostic method for sentiment analysis. Cross-language experiments carried out on the new MultiEmo dataset with texts in 11 languages proved that LaBSE embeddings with an additional attention layer implemented in the BiLSTM architecture outperformed other methods in most cases.KeywordsCross-language NLPSentimen...
Article
Some tasks in content processing, e.g., natural language processing (NLP) like hate or offensive speech, emotional or funny texts detection are subjective by nature. Each human may perceive some content in their own individual way. The existing reasoning methods commonly rely on agreed output values, the same for all recipients. We propose fundamen...
Article
Full-text available
Emotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings data...
Conference Paper
Full-text available
Many tasks in natural language processing like offensive, toxic, or emotional text classification are subjective by nature. Humans tend to perceive textual content in their own individual way. Existing methods commonly rely on the agreed output values, the same for all consumers. Here, we propose personalized solutions to subjective tasks. Our four...
Conference Paper
Full-text available
Analysis of emotions elicited by opinions, comments, or articles commonly exploits annotated corpora, in which the labels assigned to documents average the views of all annotators, or represent a majority decision. The models trained on such data are effective at identifying the general views of the population. However, their usefulness for predict...
Chapter
Full-text available
This article presents MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annota...
Article
Full-text available
In this article, we present a novel technique for the use of language-agnostic sentence representations to adapt the model trained on texts in Polish (as a low-resource language) to recognize polarity in texts in other (high-resource) languages. The first model focuses on the creation of a language-agnostic representation of each sentence. The seco...
Conference Paper
Full-text available
In this article we present an extended version of PolEmo – a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtaine...
Conference Paper
Full-text available
In this article, we present a novel multi-domain dataset of Polish text reviews, annotated with sentiment on different levels: sentences and the whole documents. The annotation was made by linguists in a 2+1 scheme (with inter-annotator agreement analysis). We present a preliminary approach to the classification of labelled data using logistic regr...
Conference Paper
Full-text available
In this article, we present a novel multidomain dataset of Polish text reviews. The data were annotated as part of a large study involving over 20,000 participants. A total of 7,000 texts were described with metadata, each text received about 25 annotations concerning polarity, arousal and eight basic emotions, marked on a multilevel scale. We pres...

Network

Cited By