Maciej Piasecki

Maciej Piasecki
Wrocław University of Science and Technology | WUT · Department of Artificial Intelligence

PhD DSc

About

199
Publications
33,502
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,167
Citations

Publications

Publications (199)
Preprint
Full-text available
Advancements in AI and natural language processing have revolutionized machine-human language interactions, with question answering (QA) systems playing a pivotal role. The knowledge base question answering (KBQA) task, utilizing structured knowledge graphs (KG), allows for handling extensive knowledge-intensive questions. However, a significant ga...
Article
Full-text available
CLARIN is a European Research Infrastructure Consortium developing and providing a federated and interoperable platform to support scientists in the field of the Social Sciences and Humanities in carrying-out language-related research. This contribution provides an overview of the entire infrastructure with a particular focus on tool interoperabili...
Article
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Preprint
Full-text available
The BEIR dataset is a large, heterogeneous benchmark for Information Retrieval (IR) in zero-shot settings, garnering considerable attention within the research community. However, BEIR and analogous datasets are predominantly restricted to the English language. Our objective is to establish extensive large-scale resources for IR in the Polish langu...
Article
One of the main research questions concerning multi-word expressions (MWEs) is which of them are transparent word combinations created ad hoc and which are multi-word lexical units (MWUs). In this paper, we use selected corpus-linguistic and machine-learning methods to determine which lexicalization criteria guide Polish and English lexicographers...
Preprint
Full-text available
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-kn...
Conference Paper
Full-text available
Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexi-calised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to impr...
Thesis
Full-text available
Jakość narzędzi do rozpoznawania nazw własnych w tekstach zależy od dziedziny i pokrycia danych treningowych. Uzyskanie modelu o zadowalającej efektywności wymaga skorzystania z korpusu o dużej liczbie próbek, co przekłada się na czas poświęcony na anotację danych przez użytkowników o umiejętnościach takich jak inżynierowie uczenia maszynowego, ana...
Preprint
Full-text available
The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare lar...
Article
Full-text available
The results of manual mapping of Polish plWordNet onto English Princeton WordNet revealed a number of gaps and mismatches between those interlinked lexical resources. Preliminary studies have shown that they embrace wordnet-specific and language-specific differences, and in this exploratory study we focus on the latter, also called lacunae. Capital...
Chapter
Full-text available
In this work, we present an advanced semantic search engine dedicated to travel offers, allowing the user to create queries in the Natural Language. We started with the Polish language in focus. Search for e-commerce requires a different set of methods and algorithms than search for travel, search for corporate documents, for law documents, for med...
Chapter
In this paper we present a new approach to the problem of lemmatisation in inflectional languages on the example of Polish. We made an introduction to the problem domain, described the solution used – the Transformer architecture and learning process on lexical data – and presented experimental results showing a high degree of generalization of the...
Chapter
Multiword Expression (MWE) detection is a crucial problem for many NLP applications. Recent methods approach it as a sequence labeling task and require manually annotated corpus. Traditional methods are based on statistical association measures and express limited accuracy, especially on smaller corpora. In this paper, we propose a novel weakly sup...
Chapter
Full-text available
We propose and test multiple neuro-symbolic methods for sentiment analysis. They combine deep neural networks – transformers and recurrent neural networks – with external knowledge bases. We show that for simple models, adding information from knowledge bases significantly improves the quality of sentiment prediction in most cases. For medium-sized...
Chapter
Effective methods of the detection of multiword expressions are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. In this paper, we...
Chapter
Full-text available
In this article we present extended results obtained on the multidomain dataset of Polish text reviews collected within the Sentimenti project. We present preliminary results of classification models trained and tested on 7,000 texts annotated by over 20,000 individuals using valence, arousal, and eight basic emotions from Plutchik’s model. Additio...
Chapter
In this article, we take a new look on automated analysis and recognition of morpho-semantic relations in Polish. We present a combination of two methods for join exploration on word-form information – generating new forms and classifying pairs of words in derivational relations. As a method of generation, we used the Transformer architecture in th...
Conference Paper
Full-text available
Effective methods for multiword expressions detection are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme applied to an annotated corpus, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two appr...
Article
Full-text available
Emotion lexicons are useful in research across various disciplines, but the availability of such resources remains limited for most languages. While existing emotion lexicons typically comprise words, it is a particular meaning of a word (rather than the word itself) that conveys emotion. To mitigate this issue, we present the Emotion Meanings data...
Conference Paper
Full-text available
The paper reports on the methodology and final results of a large-scale synset mapping between plWordNet and Princeton WordNet. Dedicated manual and semi-automatic mapping procedures as well as interlingual relation types for nouns, verbs, adjectives and adverbs are described. The statistics of all types of interlingual relations are also provided.
Article
Full-text available
Robust methods have been proposed for content and topic-based text classification, as well authorship attribution in stylometry. However, the problem of a fine-grained literary genre (style) recognition is much less studied. We present several approaches to the recognition of eight literary genres manually annotated in a large corpus of Polish blog...
Chapter
Wordnets for different languages are linked through synsets - sets of synonymous word senses. We present a method of automated transforming synset mapping to sense mapping to build a network of translational equivalents. Two heuristics based on a cross-lingual distributional similarity model are compared with several variants of machine learning ba...
Data
Presentation for the article: Propagation of emotions, arousal and polarity in WordNet using Heterogeneous Structured Synset Embeddings
Conference Paper
Full-text available
Relation Extraction is a fundamental NLP task. In this paper we investigate the impact of underlying text representation on the performance of neural classification models in the task of Brand-Product relation extraction. We also present the methodology of preparing annotated textual corpora for this task and we provide valuable insight into the pr...
Conference Paper
Full-text available
The paper presents the latest release of the Polish WordNet, namely plWord-Net 4.1. The most significant developments since 3.0 version include new relations for nouns and verbs, mapping semantic role-relations from the va-lency lexicon Walenty onto the plWord-Net structure and sense-level interlin-gual mapping. Several statistics are presented in...
Conference Paper
In this paper we present a morpho-syntactic tagger dedicated to Computer-mediated Communication texts in Polish. Its construction is based on an expanded RNN-based neural network adapted to the work on noisy texts. Among several techniques, the tagger utilises fastText embedding vectors, sequential character embedding vectors, and Brown clustering...
Article
Full-text available
Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of th...
Conference Paper
Full-text available
In this paper we present a novel method for emotive propagation in a wordnet based on a large emotive seed. We introduce a sense-level emotive lexicon annotated with polarity, arousal and emotions. The data were annotated as a part of a large study involving over 20,000 participants. A total of 30,000 lexical units in Polish WordNet were described...
Article
Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful app...
Conference Paper
Full-text available
In this article, we present a novel multidomain dataset of Polish text reviews. The data were annotated as part of a large study involving over 20,000 participants. A total of 7,000 texts were described with metadata, each text received about 25 annotations concerning polarity, arousal and eight basic emotions, marked on a multilevel scale. We pres...
Article
Full-text available
Though the interest in use of wordnets for lexicography is (gradually) growing, no research has been conducted so far on equivalence between lexical units (or senses) in inter-linked wordnets. In this paper, we present and validate a procedure of sense-linking between plWordNet and Princeton WordNet. The proposed procedure employs a continuum of th...
Article
Full-text available
Dynamic verbs in the Wordnet of Polish The paper presents patterns of co-occurrences of wordnet relations involving verb lexical units in plWordNet - a large wordnet of Polish. The discovered patterns reveal tendencies of selected synset and lexical relations to form regular circular structures of clear semantic meanings. They involve several type...
Article
Full-text available
Lexical platform – the first step towards user-centred integration of lexical resources Lexical platform – the first step towards user-centred integration of lexical resources The paper describes the Lexical Platform - a means for lightweight integration of independent lexical resources. Lexical resources (LRs) are represented as web components th...
Chapter
We present a method for computing semantic similarity of Polish texts with main focus given to short texts. We have taken into account the limited set of language tools for Polish, and especially that syntactic and semantic parsers do not express accuracy and robustness high enough and to become a stable basis for similarity computation. A very lar...
Conference Paper
Full-text available
In this paper we present a novel approach to the construction of an extensive, sense-level sentiment lexicon built on the basis of a wordnet. The main aim of this work is to create a high-quality sentiment lexicon in a partially automated way. We propose a method called Classifier-based Polarity Propagation, which utilises a very rich set of wordne...
Article
Full-text available
This paper presents a supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. Its core is a graph-based representation constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relations extracted from text. Similarity between sentences is calculated as similarity b...
Chapter
The paper presents a new functionality of CLARIN-PL Language Technology Centre (LTC). LTC Platform is developed as a research place for processing, visualizing and depositing language data. It can connect and support the research workflow, enabling scientists to increase the efficiency and effectiveness of their research in connection to CLARIN ser...
Conference Paper
Full-text available
The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset level mapping of nouns between Princeton WordNet and plWordNet. It takes i...
Conference Paper
Full-text available
In this paper we present a comprehensive overview of recent methods of the sentiment propagation in a wordnet. Next, we propose a fully automated method called Classifier-based Polarity Propagation , which utilises a very rich set of features , where most of them are based on wordnet relation types, multi-level bag-of-synsets and bag-of-polarities....
Article
Full-text available
An open stylometric system based on multilevel text analysis Stylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from diff...
Article
Full-text available
This paper explores inter-lingual equivalence from the perspective of linking two large lexicosemantic databases, namely the Princeton WordNet of English and the plWordnet (pl. Słowosiec) of Polish. Wordnets are built as networks of lexico-semantic relations between words and their meanings, and constitute a type of monolingual dictionary cum thesa...
Conference Paper
Full-text available
In this paper we present our attempts in the PolEval 2017 Sentiment Analysis Task. The task is not only one of the first challenges in sentiment analysis focused on Polish language, but also represents a novel approach to sentiment analysis, namely, predicting the sentiment not of a sentence, or a document, but of a word or a phrase within the cont...
Conference Paper
Full-text available
We present a large emotive lexicon of Polish which has been constructed by manual expansion of the emotive annotation defined for plWordNet 3.0 emo (a very large wordnet of Polish). The annotation encompasses: sentiment polarity, basic emotions and fundamental human values. Annotation scheme and revised guidelines for the annotation process are dis...
Conference Paper
We present a new morpho-syntactic tagger for Polish called MorphoIXTa-pl, which is based on the adaptation of the MorphoIiTa tagger developed originally for the Czech language. Following its basis, MorphoiTa-pl utilises a rich feature averaged perceptron neural network for morphological analysis and morpho-syntactic disambiguation of the Polish lan...
Conference Paper
Full-text available
In this article we present the result of the research on the recognition of genuine Polish suicide notes (SNs). We provide useful method to distinguish between SNs and other types of discourse, including counterfeited SNs. The method uses a wide range of word-based and semantic features and it was evaluated using Polish Corpus of Suicide Notes, whi...
Article
The paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicograph...
Conference Paper
Full-text available
We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with ma...
Conference Paper
Full-text available
One of the most significant events in Poland after communism broke up was the emergence of internal conflict after Polish president's plane crash on 10.04.2010. We investigate relations between political parties in Poland by opinion polls, members transitions between political parties and official speeches in Polish Parliament 4 years before and af...
Conference Paper
With the growing size of a wordnet, it is becoming more and more difficult to avoid, identify and eliminate errors in it, especially when a group of editors work in parallel. That is the case of plWordNet. Thus we need elaborated tools for both error prevention during editing, and diagnostic tools for error detection after the work was completed. I...
Article
Full-text available
A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordN...
Preprint
A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordN...
Article
Full-text available
p> Lexical Means in Communicating Emotion in Suicide Notes - on the Basis of the Polish Corpus of Suicide Notes Polish Corpus of Suicide Notes (PCSN) is a relatively large set of authentic suicide notes that are linguistically annotated on several levels. In order to identify features characteristic for this genre we compared PCSN with the coll...
Article
Full-text available
p> The System of Register Labels in plWordNet Stylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features...
Article
Full-text available
p> Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resourc...
Article
Full-text available
The paper describes a system of lexico-semantic relations proposed for the nomi-nal part of plWordNet 2.0 — the largest Polish wordnet. We briefly introduce a wordnet as a large electronic thesaurus. We discuss sixteen nominal relations together with many sub-types proposed for plWordNet 2.0. Each relation is based on linguistic intuition and suppo...
Article
Full-text available
p> Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and evaluation Adjectives in wordnets are often neglected: there are many fewer of them than nouns, and relations among them are sometimes not as varied as those among nouns or verbs. Polish WordNet 1.0 was no exception. Version 2.0 aims to correct that...
Article
Full-text available
Semantic relations between verbs in Polish WordNet 2.0. The noun dominates wordnets. The lexical semantics of verbs is usually under-represented, even if it is essential in any semantic analysis which goes beyond statistical methods. We present our attempt to remedy the imbalance; it begins by designing a sufficiently rich set of wordnet relations...
Article
Automated extraction of lexical meanings from Polish corpora: potentialities and limitations Large corpora are often consulted by linguists as a knowledge source with respect to lexicon, morphology or syntax. However, there are also several methods of automated extraction of semantic properties of language units from corpora. In the paper we focus...
Article
Full-text available
The paper investigates the accuracy of a Named Entity Recognition (NER) algo-rithm based on the Hidden Markov Model in the domain of Polish stock exchange reports. The task of NER was limited to the recognition and classification of Named Entities representing persons and companies. The algorithm was tested on a small Polish domain corpus of stock...
Article
Full-text available
Sentiment analysis is a very active and nowadays highly addressed research area. One of the problem in sentiment analysis is text classification in terms of its attitude, especially in reviews or comments from social media. In general, this problem can be solved by two different approaches: machine learning methods and based on lexicons. Methods ba...
Article
Full-text available
The paper offers a critical evaluation of the power and usefulness of an automatic prompt system based on the extended Relaxation Labelling algorithm in the process of (manual) mapping plWordNet on Princeton WordNet. To this end the results of manual mapping – that is inter-lingual relations between plWN and PWN synsets – are juxtaposed with the au...
Article
Self-organising Logic of Structures as a Basis for a Dependency-based Dynamic Semantics Model We present Self-organising Logic of Structures (SLS), a semantic representation language of high expressive power, which was designed for a fully compositional representation of discourse anaphora following the Dynamic Semantics paradigm. The application...
Article
Full-text available
In the paper we present an extended version of the graph-based unsupervised Word Sense Disambiguation algorithm. The algorithm is based on the spreading activation scheme applied to the graphs dynamically built on the basis of the text words and a large wordnet. The algorithm, originally proposed for English and Princeton WordNet, was adapted to Po...
Conference Paper
Full-text available
Polish named entities are mostly out-of-vocabulary words, i.e. they are not described in morphological lexicons, and their proper analysis by Polish morphological analysers is difficult.The existing approaches to guessing unknown word lemmas and descriptions do not provide results on a satisfactory level. Moreover, lemmatisation of multiword named...
Conference Paper
A corpus-based Measure of Semantic Relatedness can be calculated for every pair of words occurring in the corpus, but it can produce erroneous results for many word pairs due to accidental associations derived on the basis of several context features. We propose a novel idea of a partial measure that assigns relatedness values only to word pairs we...
Article
A wordnet is many things to many people: a graph of inter-related lexicalised concepts, a taxonomy, a thesaurus, and so on. A wordnet makes good sense as the mainstay of any deep automated semantic analysis of text. We have begun the construction of a multi-component, multi-use toolkit of natural language processing tools with plWordNet, a very lar...
Conference Paper
Full-text available
Lexicalised concepts are represented in wordnets by word-sense pairs. The strength of markedness is one of the factors which influence word use. Stylistically unmarked words are largely context-neutral. Technical terms, obsolete words, "officialese", slangs, obscenities and so on are all marked, often strongly, and that limits their use considerabl...
Article
Full-text available
Wordnet lahko izdelamo na podlagi že obstoječega tujejezičnega wordneta ali pa kot osnovo za gradnjo vzamemo korpusne podatke. Prvi pristop je preprostejši in enostavnejši, zaradi česar ga razvijalci tudi najpogosteje uporabljajo. Vendar ima ta pristop veliko pomanjkljivost, predvsem to, da tako izdelan vir ne odseva nujno jezika, za katerega je bi...