Conference PaperPDF Available


We present a large emotive lexicon of Polish which has been constructed by manual expansion of the emotive annotation defined for plWordNet 3.0 emo (a very large wordnet of Polish). The annotation encompasses: sentiment polarity, basic emotions and fundamental human values. Annotation scheme and revised guidelines for the annotation process are discussed. We present also statistics for the contemporary state of the development. Finally, the idea of the second plWordNet-based emotive lexicon created in controlled experiments is introduced. A method of selection of word senses for experiments is proposed and evaluated.
... Our dataset provides the annotations from 25 unique annotators who are students from different countries with different cultures, ages, and characteristics. The annotation strategy followed the procedures proposed by Janz et al. (2017) and Zaśko-Zielińska and Piasecki (2018). ...
... They were not remunerated, annotators were only graded based on number of annotations during one of the study tutorials. The annotation schema was based on the procedures used in (Janz et al., 2017;Zaśko-Zielińska and Piasecki, 2018). Each annotator received a subset of 400 reviews and was asked to annotate it according to their own personal emotional reaction to the given text. ...
Conference Paper
Full-text available
Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. In this paper, we introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement.
... The annotation schema was based on the procedures most widely used in previous studies aiming to create the first datasets of Polish words annotated in terms of emotion (NAWL [34]; NAWL BE [40]; plWordNet-emo [43,13]). Thus, we collected extensive annotations of valence, arousal, as well as eight emotion categories: joy, sadness, anger, disgust, fear, surprise, anticipation, and trust. ...
... Currently, a subset of 6,000 annotated word meanings is available in open access 12 along with the accompanying publication [41]. Also, a small number of text reviews (100 documents) translated into 11 languages with full annotations were made available as part of the work [21] on the CLARIN-PL repository 13 . ...
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additionally, we present an extended evaluation using deep neural multilingual models and language-agnostic regressors on the translation of the original collection into 11 languages.KeywordsNLPText classificationText regressionDeep learningEmotionsValenceArousalMultilingualLanguage-agnostic
... Another lexicon used was the Polish plWordNet lexicon (Janz et al., 2017;Rudnicka et al., 2019). This is a lexicon based on Princeton WordNet in English, and the version used was 4.2. ...
... Lexical resources for sentiment analysis are primarily Word-Nets containing emotion annotation, e.g., SentiWordNet [26], WordNet-Affect [27], or plWordNet 4.0 Emo [28], [29], whose design facilitates the propagation of results using automated methods. It is noted, however, that such resources are often not fully covered by emotion annotations, are often only available for certain languages, prompting the construction of further multilingual corpora [30], and sometimes have a limited range of annotations that includes only positive, negative, or neutral affect [31]. ...
Conference Paper
Aspect-based sentiment analysis (ABSA) is a text analysis method that categorizes data by aspects and identifies the sentiment assigned to each aspect. Aspect-based sentiment analysis can be used to analyze customer opinions by associating specific sentiments with different aspects of a product or service. Most of the work in this topic is thoroughly performed for English, but many low-resource languages still lack adequate annotated data to create automatic methods for the ABSA task. In this work, we present annotation guidelines for the ABSA task for Polish and preliminary annotation results in the form of the AspectEmo corpus, containing over 1.5k consumer reviews annotated with over 63k annotations. We present an agreement analysis on the resulting annotated corpus and preliminary results using transformer-based models trained on AspectEmo.
... Classic methods do not consider context and word order, e.g. the bag-of-words model (Zhang et al., 2010) or TF-IDF (Sahlgren et al., 2018). The representation may be extended with additional ontologies (Bloehdorn and Hotho, 2004) or WordNets (Scott and Matwin, 1998;Piasecki et al., 2009;Misiaszek et al., 2014;Janz et al., 2017;Kocoń et al., 2019b) and used with SVM (Razavi et al., 2010) or logistic regression models (Waseem and Hovy, 2016;Sahlgren et al., 2018;Kocoń et al., 2018;Kocoń and Maziarz, 2021). New methods often use word embeddings (Wiegand et al., 2018;Bojanowski et al., 2017;Łukasz Augustyniak et al., 2021) (Wiegand et al., 2018;Bojanowski et al., 2017) mixed with character embeddings (Augustyniak et al., 2019), together with deep neural networks, e.g. ...
Conference Paper
Full-text available
There is content such as hate speech, offensive, toxic or aggressive documents, which are perceived differently by their consumers. They are commonly identified using classifiers solely based on textual content that generalize pre-agreed meanings of difficult problems. Such models provide the same results for each user, which leads to high misclassification rate observable especially for contentious, aggressive documents. Both document controversy and user nonconformity require new solutions. Therefore, we propose novel personalized approaches that respect individual beliefs expressed by either user conformity-based measures or various embeddings of their previous text annotations. We found that only a few annotations of most controversial documents are enough for all our personalization methods to significantly outperform classic, generalized solutions. The more controversial the content, the greater the gain. The personalized solutions may be used to efficiently filter unwanted aggressive content in the way adjusted to a given person.
... The annotation schema was based on the procedures most widely used in NAWL , NAWL BE and plWordNet-emo (Zaśko-Zielińska et al., 2015;Janz et al., 2017;Kocoń et al., 2018;Kulisiewicz et al., 2015). Therefore, the acquired data consists of ten emotional categories: valence, arousal, and eight basic emotions: sadness, anticipation, joy, fear, surprise, disgust, trust and anger. ...
Conference Paper
Full-text available
Analysis of emotions elicited by opinions, comments, or articles commonly exploits annotated corpora, in which the labels assigned to documents average the views of all annotators, or represent a majority decision. The models trained on such data are effective at identifying the general views of the population. However, their usefulness for predicting the emotions evoked by the textual content in a particular individual is limited. In this paper, we present a study performed on a dataset containing 7,000 opinions, each annotated by about 50 people with two dimensions: valence, arousal, and with intensity of eight emotions from Plutchik’s model. Our study showed that individual responses often significantly differed from the mean. Therefore, we proposed a novel measure to estimate this effect – Personal Emotional Bias (PEB). We also developed a new BERT-based transformer architecture to predict emotions from an individual human perspective. We found PEB a major factor for improving the quality of personalized reasoning. Both the method and measure may boost the quality of content recommendation systems and personalized solutions that protect users from hate speech or unwanted content, which are highly subjective in nature.
... bag-of-words model (Buczynski & Wawer, 2008;Zhang, Jin, & Zhou, 2010) or TF-IDF (Davidson, Warmsley, Macy, & Weber, 2017;Kocoń, Janz, & Piasecki, 2018a;Sahlgren, Isbister, & Olsson, 2018;Senarath & Purohit, 2020). They are often enhanced with extra concepts from the given ontologies (Bloehdorn & Hotho, 2004) or WordNets (Bartusiak, Augustyniak, Kajdanowicz, Kazienko, & Piasecki, 2019;Janz, Kocon, Piasecki, & Zasko-Zielinska, 2017;Kocoń & Maziarz, 2021;Maziarz, Piasecki, Rudnicka, & Szpakowicz, 2013;Misiaszek et al., 2014;Piasecki, Broda, & Szpakowicz, 2009;Scott & Matwin, 1998) and used together with classical methods, such as SVM (Piasecki, Mlynarczyk, & Kocon, 2017;Razavi, Inkpen, Uritsky, & Matwin, 2010;Senarath & Purohit, 2020) or logistic regression (Davidson et al., 2017;Kocoń, Janz, & Piasecki, 2018b;Kocoń & Piasecki, 2012;Sahlgren et al., 2018;Waseem & Hovy, 2016). ...
Full-text available
Analysis of subjective texts like offensive content or hate speech is a great challenge, especially regarding annotation process. Most of current annotation procedures are aimed at achieving a high level of agreement in order to generate a high quality reference source. However, the annotation guidelines for subjective content may restrict the annotators’ freedom of decision making. Motivated by a moderate annotation agreement in offensive content datasets, we hypothesize that personalized approaches to offensive content identification should be in place. Thus, we propose two novel perspectives of perception: group-based and individual. Using demographics of annotators as well as embeddings of their previous decisions (annotated texts), we are able to train multimodal models (including transformer-based) adjusted to personal or community profiles. Based on the agreement of individuals and groups, we experimentally showed that annotator group agreeability strongly correlates with offensive content recognition quality. The proposed personalized approaches enabled us to create models adaptable to personal user beliefs rather than to agreed offensiveness understanding. Overall, our individualized approaches to offensive content classification outperform classic data-centric methods that generalize offensiveness perception and it refers to all six tested models. Additionally, we developed requirements for annotation procedures, personalization and content processing to make the solutions human-centered.
We introduce a comprehensive evaluation benchmark for Polish Word Sense Disambiguation task. The benchmark consists of 7 distinct datasets with sense annotations based on plWordNet–4.2. As far as we know, our work is a first attempt to standardise existing sense annotated data for Polish. We also follow the recent trends of neural WSD solutions and we test transfer learning models, as well as hybrid architectures combining lexico-semantic networks with neural text encoders. Finally, we investigate the impact of bilingual training on WSD performance. The bilingual model obtains new State of the Art performance in Polish WSD task. KeywordsWSDKnowledge basesNeural modelsBenchmarking
5G networks offer novel communication infrastructure for Internet of Things applications, especially for healthcare applications. There, edge computing enabled Internet of Medical Things provides online patient status monitoring. In this contribution, a Chicken Swarm Optimization algorithm, based on Energy Efficient Multi-objective clustering is applied in an IoMT system. An effective fitness function is designed for cluster head selection. In a simulated environment, performance of proposed scheme is evaluated. KeywordsEnergy efficiencyNetwork lifetimeClusteringCluster head selectionDelayChicken swarm optimizationSensor networksAdaptive networks
In this article, we describe the design principles of the ten newly published CLARIN-PL corpora of Slavic and Baltic languages. In relation to other non-commercial online corpora, we highlight the distinctive features of these CLARIN-PL corpora: resource selection, preprocessing, manual segmentation at the sentence level, lemmatisation, annotation and metadata. We also present current and planned work on the development of the CLARIN-PL Balto–Slavic corpora.
ResearchGate has not been able to resolve any references for this publication.