Sebastian Padó

Sebastian Padó
Universität Stuttgart · Institute for Natural Language Processing

Professor

About

171
Publications
22,444
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,360
Citations
Citations since 2016
69 Research Items
2213 Citations
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
Additional affiliations
October 2013 - present
Universität Stuttgart
Position
  • Professor of Computational Linguistics
October 2010 - September 2013
Universität Heidelberg
Position
  • Professor of Computational Linguistics

Publications

Publications (171)
Preprint
Full-text available
Even though fine-tuned neural language models have been pivotal in enabling "deep" automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political...
Conference Paper
Full-text available
Positional analyses in political science target the identification of (dis)similarities between parties based on their stance on a given policy or political demand (claim)-which are, unsurprisingly, a well-explored source of information for text-as-data approaches to this task. Political actors, however, do not only make claims regarding a given po...
Article
Full-text available
In recent years, there has been an increasing awareness that many NLP systems incorporate biases of various types (e.g., regarding gender or race) which can have significant negative consequences. At the same time, the techniques used to statistically analyze such biases are still relatively simple. Typically, studies test for the presence of a sig...
Preprint
Full-text available
A number of models for neural content-based news recommendation have been proposed. However, there is limited understanding of the relative importances of the three main components of such systems (news encoder, user encoder, and scoring function) and the trade-offs involved. In this paper, we assess the hypothesis that the most widely used means o...
Article
Full-text available
A recent research direction in computational linguistics involves efforts to make the field, which used to focus primarily on English, more multilingual and inclusive. However, resource creation often remains a bottleneck for many languages, in particular at the semantic level. In this article, we consider the case of frame-semantic annotation. We...
Article
Full-text available
The ’short answer’ question format is a widely used tool in educational assessment, in which students write one to three sentences in response to an open question. The answers are subsequently rated by expert graders. The agreement between these graders is crucial for reliable analysis, both in terms of educational strategies and in terms of develo...
Preprint
Full-text available
The capabilities and limitations of BERT and similar models are still unclear when it comes to learning syntactic abstractions, in particular across languages. In this paper, we use the task of subordinate-clause detection within and across languages to probe these properties. We show that this task is deceptively simple, with easy gains offset by...
Chapter
Full-text available
In this paper, we are concerned with the phenomenon of function word polysemy. We adopt the framework of distributional semantics, which characterizes word meaning by observing occurrence contexts in large corpora and which is in principle well situated to model polysemy. Nevertheless, function words were traditionally considered as impossible to a...
Preprint
Full-text available
Source code summarization is the task of generating a high-level natural language description for a segment of programming language code. Current neural models for the task differ in their architecture and the aspects of code they consider. In this paper, we show that three SOTA models for code summarization work well on largely disjoint subsets of...
Conference Paper
Full-text available
The capabilities and limitations of BERT and similar models are still unclear when it comes to learning syntactic abstractions, in particular across languages. In this paper, we use the task of subordinate-clause detection within and across languages to probe these properties. We show that this task is deceptively simple, with easy gains offset by...
Preprint
Full-text available
Newspaper reports provide a rich source of information on the unfolding of public debate on specific policy fields that can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the c...
Article
Full-text available
We present the results of a large-scale corpus-based comparison of two German event nominalization patterns: deverbal nouns in -ung (e.g., die Evaluierung, ‘the evaluation’) and nominal infinitives (e.g., das Evaluieren, ‘the evaluating’). Among the many available event nominalization patterns for German, we selected these two because they are both...
Preprint
Full-text available
The recognition of hate speech and offensive language (HOF) is commonly formulated as a classification task to decide if a text contains HOF. We investigate whether HOF detection can profit by taking into account the relationships between HOF and similar concepts: (a) HOF is related to sentiment analysis because hate speech is typically a negative...
Article
Full-text available
Cognitive scientists have long used distributional semantic representations of categories. The predominant approach uses distributional representations of category-denoting nouns, such as “city” for the category city. We propose a novel scheme that represents categories as prototypes over representations of names of its members, such as “Barcelona,...
Preprint
Full-text available
In structured prediction, a major challenge for models is to represent the interdependencies within their output structures. For the common case where outputs are structured as a sequence, linear-chain conditional random fields (CRFs) are a widely used model class which can learn local dependencies in output sequences. However, the CRF's Markov ass...
Preprint
Full-text available
When humans judge the affective content of texts, they also implicitly assess the correctness of such judgment, that is, their confidence. We hypothesize that people's (in)confidence that they performed well in an annotation task leads to (dis)agreements among each other. If this is true, confidence may serve as a diagnostic tool for systematic dif...
Preprint
Full-text available
Span identification (in short, span ID) tasks such as chunking, NER, or code-switching detection, ask models to identify and classify relevant spans in a text. Despite being a staple of NLP, and sharing a common structure, there is little insight on how these tasks' properties influence their difficulty, and thus little guidance on what model famil...
Book
Full-text available
The Center for Reflected Text Analytics (CRETA) develops interdisciplinary mixed methods for text analytics in the research fields of the digital humanities. This volume is a collection of text analyses from specialty fields including literary studies, linguistics, the social sciences, and philosophy. It thus offers an overview of the methodology o...
Chapter
Full-text available
Most approaches to emotion analysis in fictional texts focus on detecting the emotion class expressed over the course of a text, either with machine learning-based classification or with dictionaries. These approaches do not consider who experiences the emotion and what triggers it and therefore, as a necessary simplicifaction, aggregate across dif...
Article
Full-text available
Discourse network analysis is an aspiring development in political science which analyzes political debates in terms of bipartite actor/claim networks. It aims at understanding the structure and temporal dynamics of major political debates as instances of politicized democratic decision making. We discuss how such networks can be constructed on the...
Article
Full-text available
This article investigates the integration of machine learning in the political claim annotation workflow with the goal to partially automate the annotation and analysis of large text corpora. It introduces the MARDY annotation environment and presents results from an experiment in which the annotation quality of annotators with and without machine...
Conference Paper
Full-text available
DEbateNet-migr15 is a manually annotated dataset for German which covers the public debate on immigration in 2015. The building block of our annotation is the political science notion of a claim, i.e., a statement made by a political actor (a politician, a party, or a group of citizens) that a specific action should be taken (e.g., vacant flats sho...
Preprint
In response to the continuing research interest in computational semantic analysis, we have proposed a new task for SemEval-2010: multi-way classification of mutually exclusive semantic relations between pairs of nominals. The task is designed to compare different approaches to the problem and to provide a standard testbed for future research. In t...
Conference Paper
Full-text available
This paper describes the MARDY corpus annotation environment developed for a collaboration between political science and computational linguistics. The tool realizes the complete workflow necessary for annotating a large newspaper text collection with rich information about claims (demands) raised by politicians and other actors, including claim an...
Conference Paper
Full-text available
Understanding the structures of political debates (which actors make what claims) is essential for understanding democratic political decision-making. The vision of computational construction of such discourse networks from newspaper reports brings together political science and natural language processing. This paper presents three contributions t...
Preprint
This paper is a first attempt at reconciling the current methods of distributional semantics with the function word emphasis of formal linguistics. We consider a multiply polysemous function word, the German reflexive pronoun "sich", and investigate in which ways natural subclasses of this word known from the theoretical and typological literature...
Article
Full-text available
The empirical study of emotions in Spanish travelogues and reports requires cultural knowledge as well as the use of linguistic annotation and quantitative methods. We report on an interdisciplinary project in which we perform emotion annotation on a selection of texts spanning several centuries to analyze the differences across different time slic...
Preprint
Full-text available
Sentiment analysis has a range of corpora available across multiple languages. For emotion analysis, the situation is more limited, which hinders potential research on cross-lingual modeling and the development of predictive models for other languages. In this paper, we fill this gap for German by constructing deISEAR, a corpus designed in analogy...
Conference Paper
Full-text available
Categorization is a central capability of human cognition, and a number of theories have been developed to account for properties of categorization. Despite the fact that many semantic tasks involve categorization, theories of categorization do not play a major role in contemporary research in computational linguistics. This paper follows the idea...
Chapter
This chapter gives an overview of work on the representation of semantic information in lexicon resources for computational natural language processing (NLP). It starts with a broad overview of the history and state of the art of different types of semantic lexicons in Computational Linguistics, and discusses their main use cases. Section 2 is devo...
Article
Full-text available
One of the central problems in the semantics of derived words is polysemy (see, for example, the recent contributions by Lieber 2016 and Plag et al. 2018 ). In this paper, we tackle the problem of disambiguating newly derived words in context by applying Distributional Semantics ( Firth 1957 ) to deverbal -ment nominalizations (e.g. bedragglement,...
Chapter
One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects rep...
Article
Characterizing paraphrases formally has proven to be a challenging task. Hasegawa et al. (2011) pointed out the usefulness of FrameNet for paraphrase research, focusing on paraphrases which are backed by underlying classical linguistic relationships such as synonymy or voice alternations. This article proposes that other frame-to-frame-relations, n...
Preprint
In computational linguistics, a large body of work exists on distributed modeling of lexical relations, focussing largely on lexical relations such as hypernymy (scientist -- person) that hold between two categories, as expressed by common nouns. In contrast, computational linguistics has paid little attention to entities denoted by proper nouns (M...
Article
Word sense induction is the most prominent unsupervised approach to lexical disambiguation. It clusters word instances, typically represented by their bag-of-words contexts. Therefore, uninformative and ambiguous contexts present a major challenge. In this paper, we investigate the use of an alternative instance representation based on lexical subs...
Conference Paper
Full-text available
Word sense induction is the most prominent unsupervised approach to lexical disambiguation. It clusters word instances, typically represented by their bag-of-words contexts. Therefore, uninformative and ambiguous contexts present a major challenge. In this paper, we investigate the use of an alternative instance representation based on lexical subs...
Article
Full-text available
Complement coercion (begin a book →reading) involves a type clash between an event-selecting verb and an entity-denoting object, triggering a covert event (reading). Two main factors involved in complement coercion have been investigated: the semantic type of the object (event vs. entity), and the typicality of the covert event (the author began a...
Conference Paper
Full-text available
We report on a study applying compositional distributional semantic models (CDSMs) to a set of Ukrainian derivational patterns. Ukrainian is an interesting language as it is morphologically rich, and low-resource. Our study aims at resolving inconsistent results from previous studies which employed CDSMs for derivation; we provide evidence for a cr...
Conference Paper
Full-text available
This paper analyses the development of emotions in different genres of literature. We discover different emotion patterns in adventures, mystery, science fiction, romance and humorous fiction stories. We support our findings with quantitative and qualitative analyses.
Article
Reference is the crucial property of language that allows us to connect linguistic expressions to the world. Modeling it requires handling both continuous and discrete aspects of meaning. Data-driven models excel at the former, but struggle with the latter, and the reverse is true for symbolic models. We propose a fully data-driven, end-to-end trai...
Conference Paper
Full-text available
There is a rich variety of data sets for sentiment analysis (viz., polarity and subjec-tivity classification). For the more challenging task of detecting discrete emotions following the definitions of Ekman and Plutchik, however, there are much fewer data sets, and notably no resources for the social media domain. This paper contributes to closing...
Conference Paper
Full-text available
Compositional distributional semantic models (CDSMs) have successfully been applied to the task of predicting the meaning of a range of linguistic constructions. Their performance on semi-compositional word formation process of (morphological) derivation, however, has been extremely variable, with no large-scale empirical investigation to date. Thi...
Article
One of the most basic functions of language is to refer to objects in a shared scene. Modeling reference with continuous representations is challenging because it requires individuation, i.e., tracking and distinguishing an arbitrary number of referents. We introduce a neural network model that, given a definite description and a set of objects rep...
Article
Native Language Identification (NLI) is the task of recognizing the native language of an author from text that they wrote in another language. In this paper, we investigate the generalizability of NLI models among learner corpora, and from learner corpora to a new text type, namely scientific articles. Our main results are: (a) the science corpus...
Conference Paper
Full-text available
Conversion is a word formation operation that changes the grammatical category of a word in the absence of overt morphology. Conversion is extremely productive in English (e.g., tunnel, talk). This paper investigates whether distributional information can be used to predict the diachronic direction of conversion for homophonous noun‐verb pairs. We...
Article
Syntax-based semantic spaces are more flexible and can potentially better model semantic relatedness than bag-of-words spaces. Their application is however limited by sparsity and restricted coverage. We address these problems by smoothing syntax-based with word-based spaces and investigate when to choose which prediction. We obtain the best result...
Conference Paper
Full-text available
Distributional methods have proven to excel at capturing fuzzy, graded aspects of meaning (Italy is more similar to Spain than to Germany). In contrast, it is difficult to extract the values of more specific attributes of word referents from distribu-tional representations, attributes of the kind typically found in structured knowledge bases (Italy...
Conference Paper
Full-text available
The Practical Lexical Function model (PLF) is a recently proposed compositional distribu-tional semantic model which provides an elegant account of composition, striking a balance between expressiveness and robustness and performing at the state-of-the-art. In this paper, we identify an inconsistency in PLF between the objective function at trainin...