Marek Maziarz

Marek Maziarz
Wroclaw University of Science and Technology | WUT · Department of Artificial Intelligence

PhD, MSc

About

42
Publications
5,608
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
409
Citations
Introduction
My research interests cover the field of relational semantics and natural language processing. I lead a project devoted to polysemy theory, funded by the National Science Centre of Poland.
Additional affiliations
March 2010 - June 2016
Wroclaw University of Science and Technology
Position
  • Professor (Associate)
Education
September 2002 - September 2009
University of Wroclaw
Field of study
  • linguistics
October 2000 - September 2015
University of Wroclaw
Field of study
  • theoretical physics

Publications

Publications (42)
Conference Paper
Full-text available
WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evalua...
Chapter
Lexical resources are crucial in many modern applications of Natural Language Processing and Artificial Intelligence. We present VeSNet – a network of lexical resources resulting from the merge of Polish-English WordNet (PEWN) with several existing large electronic thesauri from the Linked Open Data cloud (DBpedia, Wikipedia, GeoWordNet, Agrovoc, E...
Article
Full-text available
In this article we extend a WordNet structure with relations linking synsets to Desikan’s brain regions. Based on lexicographer files and WordNet Domains the mapping goes from synset semantic categories to behavioural and cognitive functions and then directly to brain lobes. A human brain connectome (HBC) adjacency matrix was utilised to capture tr...
Conference Paper
Full-text available
In this paper we compare Oxford Lexico and Merriam Webster dictionaries with Princeton WordNet with respect to the description of semantic (dis)similarity between polysemous and homonymous senses that could be inferred from them. WordNet lacks any explicit description of polysemy or homonymy, but as a network of linked senses it may be used to comp...
Conference Paper
Full-text available
We propose a novel method of homonymy-polysemy discrimination for three Indo-European Languages (English, Spanish and Polish). Support vector machines and LASSO logistic regression were successfully used in this task, outperforming baselines. The feature set utilised lemma properties, gloss similarities, graph distances and polysemy patterns. The p...
Article
Full-text available
Expanding WordNet with Gloss and Polysemy Links for Evocation Strength Recognition Evocation – a phenomenon of sense associations going beyond standard (lexico)-semantic relations – is difficult to recognise for natural language processing systems. Machine learning models give predictions which are only moderately correlated with the evocation str...
Conference Paper
In this paper we present a new individual measure for the task of evocation strength prediction. The proposed solution is based on Dijkstra’s distances calculated on the WordNet graph expanded with polysemy relations. The polysemy network was constructed using chaining procedure executed on individual word senses of polysemous lemmas. We show that...
Conference Paper
Full-text available
According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks....
Conference Paper
Full-text available
We have released plWordNet 3.0, a very large wordnet for Polish. In addition to what is expected in wordnets – richly interrelated synsets – it contains sentiment and emotion annotations, a large set of multi-word expressions, and a mapping onto WordNet 3.1. Part of the release is enWordNet 1.0, a substantially enlarged copy of WordNet 3.1, with ma...
Conference Paper
With the growing size of a wordnet, it is becoming more and more difficult to avoid, identify and eliminate errors in it, especially when a group of editors work in parallel. That is the case of plWordNet. Thus we need elaborated tools for both error prevention during editing, and diagnostic tools for error detection after the work was completed. I...
Article
Full-text available
p> The System of Register Labels in plWordNet Stylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features...
Article
Full-text available
The paper describes a system of lexico-semantic relations proposed for the nomi-nal part of plWordNet 2.0 — the largest Polish wordnet. We briefly introduce a wordnet as a large electronic thesaurus. We discuss sixteen nominal relations together with many sub-types proposed for plWordNet 2.0. Each relation is based on linguistic intuition and suppo...
Article
Full-text available
p> Developing free morphological data for Polish A limiting factor in construction of Natural Language Processing (NLP) systems is often the availability of morphological resources. This indeed happens for Polish: the freely available corpus with manual morpho-syntactic annotation (part of the IPI PAN Corpus) is not coupled with any free morpho...
Article
Full-text available
p> Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and evaluation Adjectives in wordnets are often neglected: there are many fewer of them than nouns, and relations among them are sometimes not as varied as those among nouns or verbs. Polish WordNet 1.0 was no exception. Version 2.0 aims to correct that...
Article
Full-text available
Semantic relations between verbs in Polish WordNet 2.0. The noun dominates wordnets. The lexical semantics of verbs is usually under-represented, even if it is essential in any semantic analysis which goes beyond statistical methods. We present our attempt to remedy the imbalance; it begins by designing a sufficiently rich set of wordnet relations...
Article
Shallow syntactic annotation in the corpus of Wrocław University of Technology In this paper we present shallow syntactic annotation of The Wrocław University of Technology Corpus. We discuss some theoretical and practical considerations related to shallow parsing of Polish, then we present our annotation guidelines. The proposed annotation scheme...
Article
A wordnet is many things to many people: a graph of inter-related lexicalised concepts, a taxonomy, a thesaurus, and so on. A wordnet makes good sense as the mainstay of any deep automated semantic analysis of text. We have begun the construction of a multi-component, multi-use toolkit of natural language processing tools with plWordNet, a very lar...
Conference Paper
Full-text available
Lexicalised concepts are represented in wordnets by word-sense pairs. The strength of markedness is one of the factors which influence word use. Stylistically unmarked words are largely context-neutral. Technical terms, obsolete words, "officialese", slangs, obscenities and so on are all marked, often strongly, and that limits their use considerabl...
Conference Paper
A method for the recognition of the compositionality of Multi Word Expressions (MWEs) is proposed. First, we study associations between MWEs and the structure of wordnet lexico-semantic relations. A simple method of splitting plWordNet’s MWEs into compositional and non-compositional on the basis of the hypernymy structure is discussed. However, our...
Article
Full-text available
Wordnets are built of synsets, not of words. A synset consists of words. Synonymy is a relation between words. Words go into a synset because they are synonyms. Later, a wordnet treats words as synonymous because they belong in the same synset $\ldots$ … Such circularity, a well-known problem, poses a practical difficulty in wordnet construction...
Article
Full-text available
The paper presents WordNetLoom – an application for WordNet development used in the construction of a Polish WordNet called plWordNet. WordNetLoom provides two means of interaction: a form-based, implemented initially, and a graph-based introduced recently. The graphical, active presentation of WordNet structure enables direct work on the structure...
Article
The paper1 presents a rule-based approach to semantic relation recognition within the Polish noun phrase. A set of semantic relations, including some thematic relations, has been determined for the need of experiments. The method consists in two steps: first the system recognizes word pairs and triples, and then it classifies the relations. Evaluat...
Conference Paper
Full-text available
Wordnets are lexico-semantic resources essential in many NLP tasks. Princeton WordNet is the most widely known, and the most influential, among them. Wordnets for languages other than English tend to adopt unquestioninglyWordNet's structure and its net of lexicalised concepts. We discuss a large wordnet constructed independently of WordNet, upon a...
Conference Paper
URL: https://www.aclweb.org/anthology/C12-3004
Conference Paper
The Polish Wordnet, plWordNet, has been in steady development for five years. We are building it from scratch, all the time making provisions for its general compatibility with the other major wordnets. We are very close to reaching a milestone of 100000 lexical units in 70000 synsets. In addition to a fairly comprehensive coverage of common nouns,...
Article
Full-text available
We present a machine learning approach to the generation of derivative relations. Instances of derivative relations described in a wordnet are used in the bootstrapping approach to build an analyser of derivational relations. plWordNet derivational relations are presented and the planned semi-automatic wordnet expansion with derivational relations...
Conference Paper
Full-text available
The paper presents WordnetLoom - a new version of an application supporting the development of the Polish wordnet called plWordNet. The primary user interface of WordnetLoom is a graph-based, graphical, active presentation of wordnet structure. Linguist can directly work on the structure of synsets linked by relation links. The new version is compa...
Thesis
Full-text available
The subject of this dissertation is the grammatical aspect in 14th century Old Polish, in two medieval manuscripts: The Holy Cross Sermons and The Saint Florian Psalter. Old Church Slavonic verbal system (the example of a Late Proto-Slavic dialect) is the background of this research. The analysis has been based on hypotheses and methodology of the...
Article
Full-text available
Verbs usually make up 10-15% of the material in a wordnet. We have embarked on a project in which this proportion increases to a lexically more real-istic 1/3. The development of plWordNet 2.0 has been geared toward building up the verb hierarchy very considerably. The scaffolding upon which this part of our wordnet will rest is a detailed system o...
Article
Full-text available
This paper discusses the problem of shallow parsing of Polish, most specifically — chunking. We discuss some theoretical issues related to chunking of Polish texts and propose our chunk annotation guidelines. In the second part of the paper we present initial results of using Machine Learning algorithms to train a working chunker for the proposed c...
Article
Full-text available
The paper presents construction of Derywator – a language tool for the recognition of Polish derivational relations. It was built on the basis of machine learning in a way following the bootstrapping approach: a limited set of derivational pairs described manually by linguists in plWordNet is used to train Derivator. The tool is intended to be appl...
Article
Full-text available
This paper presents our efforts aimed at collecting and annotating a free Polish corpus. The corpus will serve for us as training and testing material for experiments with Machine Learning algorithms. As others may also benefit from the resource, we are going to release it under a Creative Commons licence, which is hoped to remove unnecessary usage...
Article
Full-text available
Building a wordnet is a serious undertaking. Fortunately, Language Technology (LT) can improve the process of wordnet con-struction both in terms of quality and cost. In this paper we present LT tools used during the construction of plWordNet and their influence on the lexicographer's work-flow. LT is employed in plWordNet development on every poss...

Network

Cited By

Projects

Projects (3)
Project
Lexical polysemy lies in the center of modern theories of meaning, since it is not possible to talk about word senses without considering their ambiguity, the boundaries between senses and the effect of context on shifts in meaning. To study this, we draw on techniques from general, comparative and computational linguistics. Full title: Mechanisms of Polysemy on the Basis of Analysis of Lexical Networks in Comparative Perspective.
Project
CLARIN (Common Language Resources and Technology Infrastructure) is a pan-European research infrastructure intended for the humanities and social sciences. It facilitates work with very large collections of texts.