
Adam Przepiórkowski- Polish Academy of Sciences
Adam Przepiórkowski
- Polish Academy of Sciences
About
234
Publications
18,611
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,217
Citations
Introduction
My ResearchGate profile is very incomplete and may contain all kinds of misinformation. The only place that contains my up-to-date bibliography is http://zil.ipipan.waw.pl/AdamPrzepiorkowski. I have no time to maintain other lists. Apparently, other people share this sentimentt:
• http://mahlowgpp2013.blogspot.ch/2013/10/professor-for-one-year-week-23.html
• http://nlphist.hypotheses.org/107
Current institution
Additional affiliations
February 1999 - present
Publications
Publications (234)
Heterofunctional Coordination (HC), in which conjuncts bear different grammatical functions (as in English What and whento eat to stay healthy), is assumed to be solely multiclausal in Germanic languages, i.e., to be underlyingly a coordination of clauses. This is supposed to distinguish Germanic from Slavic, where monoclausal HC is also possible,...
Celem niniejszego artykułu jest wykazanie, że – wbrew opiniom wyrażanym w słownikach, gramatykach, podręcznikach do nauki języka polskiego i pracach teoretycznych – narzędnikowe formy liczebników głównych kończące się na -u (np. pięciu) nie są we współczesnej polszczyźnie dystrybucyjnie równoważnymi wariantami form na -oma (np. pięcioma). Na podsta...
The issue of the syntactic category of unlike-category coordination has been elusive for decades, with a plethora of proposals, all deficient in one way or another. This chapter proposes to broaden the perspective and consider disjunctive constraints which are not limited to syntactic categories, but which also take into consideration morphosyntact...
The aim of this paper is to propose three improvements to the HPSG model theory. The first is a solution to certain formal problems identified in Richter 2007. These problems are solved if HPSG models are rooted models of utterances and not exhaustive models of languages, as currently assumed. The proposed solution is compatible with all existing v...
The aim of this paper is to provide a syntactico-semantic analysis of hybrid coordination, in which what is coordinated are phrases bearing different grammatical functions and different semantic roles. The proposed account improves on previous HPSG analyses by giving up the assumption that all conjuncts are dependents of the same head and, more imp...
Bruening and Al Khalaf (2020) deny the possibility of coordination of unlike categories. They use three mechanisms to reanalyze such coordination as involving same categories: conjunction reduction, supercategories, and empty heads. We show that their attempt leaves many cases of unlike category coordination unaccounted for and we point out various...
This squib argues that adverbs can act as primary predicates. In Polish, a relatively large class of adverbs are frequently used in predicative constructions when the subject of predication is an InfP (infinitival phrase) or a CP referring to abstract objects: event kinds or facts. This requirement of a purely verbal rather than nominal subject of...
The paper describes the conversion of an LFG treebank of Polish into enhanced Universal Dependencies, and—more generally—identifies the kinds of information lost in translation from LFG to UD. The paper also presents the resulting UD treebank of Polish and compares it to the previous UD treebank of Polish.
The aim of this paper is to compare two Polish predicative constructions with infinitival subjects, namely those with predicative adverbs and those with predicative adjectives. The latter construction, of the form “predicative adjective + copula + infinitival subject”, has hardly been noticed in Polish literature on predication, copulas, or infinit...
Pierwszym celem niniejszego artykułu jest potwierdzenie, że predykatywne szkoda, wstyd, żal nie są czasownikami niewłaściwymi. Dwa inne poglądy na temat statusu jednostek typu szkoda mówią, że są to przysłówki lub rzeczowniki. Autor pokazuje, że oba są prawdziwe: omawiane jednostki są kategorialnie niejednoznaczne, przy czym istnieje szereg testów...
We examine two phenomena which, with the exception of Bogal-Allbritten & Weir (2017), have not been systematically studied together but are clearly related: (a) epistemic adverbs in ad-nominal positions modifying a DP outside of coordination and (b) epistemic adverbs modifying a DP within a coordination of DPs (Collins conjunction). Ad-nominal adve...
The aim of this squib is to question the popular belief that the metaphor of valency was introduced to linguistics by Lucien Tesnière in the middle of 20th century. Rather, we show that it was first used by Charles Peirce half a century earlier, leading to apparently independent – but probably mediated by Roman Jakobson – ‘discoveries’ of this meta...
On the argument-adjunct distinction in the Polish Semantic Syntax tradition
The aim of this paper is to examine the understanding of the Argument-Adjunct Distinction within the Polish Semantic Syntax (SS) tradition, associated with the name of Stanisław Karolak and presented in the nominally syntactic volume of the Grammar of contemporary Polish (...
The aim of this paper is to propose a fully hierarchical organisation of valency information in Lexical Functional Grammar, inspired by recent LFG work on using templates to encode valency. The particular proposal rather closely follows FrameNet's inheritance hierarchy, makes heavy use of templates to encode multiple inheritance, and avoids the pro...
Artykuł przedstawia nową gramatykę formalną języka polskiego, POLFIE, zgodną z teorią Lexical Functional Grammar. Artykuł wprowadza podstawy teorii LFG, omawia sposób, w jaki reprezentowane są struktury składniowe w LFG, a następnie przedstawia możliwości formalizmu LFG na przykładzie analizy wybranych zjawisk języka polskiego w POLFIE. Artykuł pok...
Artykuł prezentuje nowy słownik walencyjny języka polskiego opracowany przez Zespół Inżynierii Lingwistycznej IPI PAN. Słownik ten, Walenty, jest nadal rozwijany, lecz już obecnie, pod koniec roku 2016, jest największym i najbardziej szczegółowym słownikiem walencyjnym języka polskiego, jedynym w pełni integrującym poziomy opisu: składniowy, semant...
The aim of this paper is to reexamine the rich repertoire of grammatical functions assumed in LFG and provide novel arguments for the claim, voiced earlier for example in Alsina et al. 2005, that most of them are redundant. We also demonstrate that a textbook LFG test for the sameness of grammatical functions of different predicates fails on closer...
The paper briefly reexamines arguments for the argument–adjunct dichotomy, commonly assumed in contemporary linguistics, showing that they do not stand up to scrutiny. It demonstrates that – perhaps surprisingly – LFG currently only assumes this dichotomy in its f-structure feature geometry, and does not rely on it in any crucial way. Building on t...
The aim of this paper is to critically examine the tests used to distinguish arguments from adjuncts in Functional Generative Description (Sgall et al., 1986) and to question the general usefulness of this distinction. In particular, we demonstrate that neither of the two tests used in FGD to distinguish inner participants from free adverbials (i.e...
Phraseological components of valency dictionaries for two West Slavic languages are presented, namely, of the PDT-Vallex dictionary for Czech and of the Walenty dictionary for Polish. Both dictionaries are corpus-based, albeit in different ways. Both are machine-readable and employable
by syntactic parsers and generators. The paper compares the exp...
p>
Towards a Linguistically-Oriented Textual Entailment Test-Suite for Polish Based on the Semantic Syntax Approach
The aim of this programmatic position paper is to show that the semantic syntax tradition of Polish linguistics associated with the name of Stanisław Karolak may be a basis for the development of a taxonomy of entailment types and...
The aim of this paper is to present PARSEME, a COST Action devoted to the issue of Multiword Expressions in parsing and in linguistic resources (corpora, lexicons). This is a “meta-paper” intended to be the main citation point for any future work referring to PARSEME: it does not describe in detail any single result of the Action, but rather summar...
Linguistic engineering and the current situation of Polish linguistics
The thesis of this reply to Piotr Żmigrodzki’s pessimistic diagnosis (published in the previous issue of „LingVaria”) of Polish linguistics from the perspective of modern lexicography is that the diagnosis from the perspective of linguistic engineering must be equally pessimist...
This paper presents a method of annotating sentences with dependency trees which is set within the mainstream of the study on dependency projection. The approach builds on the idea of weighted projection. However, we involve a weighting factor not only in the process of projecting dependency relations (weighted projection) but also in the process o...
The aim of this paper is to propose a method of simulating - in a syntactico-semantic parser - the behaviour of semantic roles in case of a language that has no resources such as VerbNet of FrameNet, but has relatively rich morphosyntax (here: Polish). We argue that using an approximation of semantic roles derived from syntactic (grammatical functi...
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative's work throughout Europe in order to boost progress a...
The aim of this paper is to refute the thesis by Zygmunt Saloni that structures such as “Kto, co i komu dał?” (“Who gave what to whom?”, literally: “Who, what and whom gave?”) do not belong to the system of contemporary Polish and that their conjunctionless counterparts (here: “Kto co komu dał?”, literally: “Who what whom gave?”) should be used ins...
This book constitutes the refereed proceedings of the 9th International Conference on Advances in Natural Language Processing, PolTAL 2014, Warsaw, Poland, in September 2014. The 27 revised full papers and 20 revised short papers presented were carefully reviewed and selected from 83 submissions. The papers are organized in topical sections on morp...
The aim of this paper is to propose a far-reaching extension of the phraseological component of a valence dictionary for Polish. The dictionary is the basis of two dierent parsers of Polish; its format has been designed so as to maximise the readability of the information it contains and its re-applicability. We believe that the extension proposed...
This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs...
This paper presents a syntactic HPSG analysis of distance distributivity in Polish, where the challenge is to uniformly analyse a number of function lexemes PO 'each' which share their form and semantic contribution, but differ in their syntactic behaviour. To this end, the HPSG notion of weak head is employed in a novel way.
This paper presents DBPediaExtender, an information extraction system that aims at extending an existing ontology of geographical entities by extracting information from text. The system uses distant supervision learning – the training data is constructed on the basis of matches between values from infoboxes (taken from the Polish DBPedia) and Wiki...
The ongoing National Corpus of Polish project assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the levels of syntactic words, syntactic groups and named entities. We show how knowledge-based platforms Spejd and Sprout are used for the...
The ever-growing popularity of Google over the recent decade has required a specific method of man-machine communication: human query should be short, whereas the machine answer may take a form of a wide range of documents. This type of communication has triggered a rapid development in the domain of Information Extraction, aimed at providing the a...
The paper introduces a new method for detecting and correcting errors in large dependency treebanks with rich morphosyntactic annotation. The technique uses error correction rules automatically extracted from the treebank. The procedure of rule extraction is based on a comparison of similar – but not identical – subgraphs of dependency structures....
This paper describes a novel weighted projection method of inducing grammatical dependency structures for Polish. Using a parallel English-Polish corpus, the English side is automatically annotated with a syntactic parser and the resulting annotations are projected to Polish via word alignment links. Projected arcs are weighted according to the cer...
The paper presents a modification — aimed at highly inflectional languages — of a recently proposed error detection method for syntactically annotated corpora. The technique described below is based on Synchronous Tree Substitution Grammar (STSG), i.e. a kind of tree transducer grammar. The method involves induction of STSG rules from a treebank an...
The paper describes a method for measuring compatibility between two levels of manual corpus annotation: shallow and deep. The proposed measures translate into a procedure for finding annotation errors at either level.
The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operatio...
The emergence of the WWW as the main source of distributing content opened the floodgates of information. The sheer volume and diversity of this content necessitate an approach that will reinvent the way it is analysed. The quantitative route to processing information which relies on content management tools provides structural analysis. The challe...
This paper describes the project 1 of annotating the national Corpus of Polish with word senses for a selection of ambiguous lexemes. The WSDDE (Word Sense Disambiguation Development Environment) tool is used for performing experiments with various supervised machine learning techniques and feature sets. The best algorithm is described in detail an...
This paper presents a method used for extracting temporal information from raw texts in Polish. The extracted information consists of the text fragments which describe events, the time expressions and the temporal relations between them. Together with temporal reasoning, it can be used in applications such as question answering or for text summariz...
This paper provides an explicit, formal analysis of two not only interesting but also frequent coordination phenomena: The coordination of unlike categories and the coordination of distinct grammatical functions possibly belonging to entirely different levels of structure. The proposed account of the former makes it possible to take full advantage...
This article proposes a method of Semantic Role Labelling for languages with no reliable deep syntactic parser and with limited corpora annotated with semantic roles. Reasonable results may be achieved with the help of shallow parsing, provided that features used for training such shallow parsers include both lexical semantic information (here: hyp...
Merging of Language Resources is not only a matter of mapping between different annotation schemata but also of linguistic tools coping with heterogeneous annotation formats in order to produce one single output. In this paper we present a web content management system ATLAS which succeeded to integrate and harmonize resources and tools for six lan...
The paper intends to give a brief summary of one the most recent efforts on building the pan-European language technology infrastructure: META-NET – a network of Excellence consisting of 54 research centres from 33 countries – and specifically, its Central and South-European participating project: CESAR. One of the major activities of the project i...
This paper addresses the problem of converting part of speech – or, more generally, morphosyntactic – annotations within a single language. Conversion between tagsets is a difficult task and, typically, it is either expensive (when performed manually) or inaccurate (lossy automatic conversion or re-tagging with classical taggers). A statistical met...