Massimo Poesio

Massimo Poesio
Queen Mary, University of London | QMUL · School of Electronic Engineering and Computer Science

PhD in Computer Science, U of Rochester, 1994

About

367
Publications
67,707
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,913
Citations
Introduction
I am a computational linguist - I study language using computational methods. My main current project is the DALI project funded by the ERC, investigating Disagreements in Anaphora and Language Interpretation. In this project we use games-with-a-purpose such as Phrase Detectives to collect large datasets of judgments, which we then use to study anaphora and develop models of anaphoric reference. I also work on the semantics of dialogue, on deception detection, and on using brain data for NLP.

Publications

Publications (367)
Article
Full-text available
Knowledge about personally familiar people and places is extremely rich and varied, involving pieces of semantic information connected in unpredictable ways through past autobiographical memories. In this work, we investigate whether we can capture brain processing of personally familiar people and places using subject-specific memories, after tran...
Preprint
Full-text available
With the increasing capabilities of LLMs, recent studies focus on understanding whose opinions are represented by them and how to effectively extract aligned opinion distributions. We conducted an empirical analysis of three straightforward methods for obtaining distributions and evaluated the results across a variety of metrics. Our findings sugge...
Article
Coreference resolution is the task of resolving mentions that refer to the same entity into clusters. The area and its tasks are crucial in natural language processing (NLP) applications. Extensive surveys of this task have been conducted for English and Chinese; not too much for Arabic. The few Arabic surveys do not cover recent progress and the c...
Preprint
Full-text available
We introduce ClarQ-LLM, an evaluation framework consisting of bilingual English-Chinese conversation tasks, conversational agents and evaluation metrics, designed to serve as a strong benchmark for assessing agents' ability to ask clarification questions in task-oriented dialogues. The benchmark includes 31 different task types, each with 10 unique...
Preprint
Full-text available
In this work we proposing adapting the Minecraft builder task into an LLM benchmark suitable for evaluating LLM ability in spatially orientated tasks, and informing builder agent design. Previous works have proposed corpora with varying complex structures, and human written instructions. We instead attempt to provide a comprehensive synthetic bench...
Conference Paper
Full-text available
Polysemes are words that can have different senses depending on the context of utterance: for instance, 'newspaper' can refer to an organization (as in 'manage the newspaper') or to an object (as in 'open the newspaper'). Contrary to a large body of evidence coming from psy-cholinguistics, polysemy has been traditionally modelled in NLP by assuming...
Article
Full-text available
Proper names are linguistic expressions referring to unique entities, such as individual people or places. This sets them apart from other words like common nouns, which refer to generic concepts. And yet, despite both being individual entities, one's closest friend and one's favorite city are intuitively associated with very different pieces of kn...
Article
Full-text available
The meaning of most words in language depends on their context. Understanding how the human brain extracts contextualized meaning, and identifying where in the brain this takes place, remain important scientific challenges. But technological and computational advances in neuroscience and artificial intelligence now provide unprecedented opportuniti...
Article
Full-text available
Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysem...
Preprint
Full-text available
Knowledge about personally familiar people and places is extremely rich and varied, involving pieces of semantic information connected in unpredictable ways through past autobiographical memories. In this work we investigate whether we can capture brain processing of personally familiar people and places using subject-specific memories, after trans...
Method
Full-text available
The Rovereto Emotion and Cooperation Corpus (RECC) is a new resource collected to investigate the relationship between cooperation and emotions in an interactive setting. Previous attempts at collecting corpora to study emotions have shown that this data are often quite difficult to classify and analyse, and coding schemes to explore emotions are u...
Preprint
Full-text available
NLP datasets annotated with human judgments are rife with disagreements between the judges. This is especially true for tasks depending on subjective judgments such as sentiment analysis or offensive language detection. Particularly in these latter cases, the NLP community has come to realize that the approach of 'reconciling' these different subje...
Article
Corpus evidence suggests that in contexts in which the presence of multiple antecedents might favor plural reference, the disadvantage observed for singular reference may disappear if the potential antecedents are combined in a group-like plural entity. We examined the relative salience of antecedents in conditions where the context either made a g...
Article
Interpreting anaphoric references is a fundamental aspect of our language competence that has long attracted the attention of computational linguists. The appearance of ever-larger anaphorically annotated data sets covering more and more anaphoric phenomena in ever-greater detail has spurred the development of increasingly more sophisticated comput...
Preprint
Full-text available
Most existing proposals about anaphoric zero pronoun (AZP) resolution regard full mention coreference and AZP resolution as two independent tasks, even though the two tasks are clearly related. The main issues that need tackling to develop a joint model for zero and non-zero mentions are the difference between the two types of arguments (zero prono...
Preprint
Full-text available
Although several datasets annotated for anaphoric reference/coreference exist, even the largest such datasets have limitations in terms of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitat...
Preprint
Full-text available
Anaphoric reference is an aspect of language interpretation covering a variety of types of interpretation beyond the simple case of identity reference to entities introduced via nominal expressions covered by the traditional coreference task in its most recent incarnation in ONTONOTES and similar datasets. One of these cases that go beyond simple c...
Article
Full-text available
Crowdsourced data are often rife with disagreement, either because of genuine item ambiguity, overlapping labels, subjectivity, or annotator error. Hence, a variety of methods have been developed for learning from data containing disagreement. One of the observations emerging from this work is that different methods appear to work best depending on...
Article
Full-text available
Semantic knowledge about individual entities (i.e., the referents of proper names such as Jacinta Ardern) is fine-grained, episodic, and strongly social in nature, when compared with knowledge about generic entities (the referents of common nouns such as politician). We investigate the semantic representations of individual entities in the brain; a...
Article
Labelling data is one of the most fundamental activities in science, and has underpinned practice particularly in medicine for decades, but also research in corpus linguistics at least since the development of the Brown corpus. With the shift in Artificial Intelligence (AI) towards Machine Learning, the creation of datasets to be used for training...
Chapter
We will now review the use of intercoder agreement measures in CL since Carletta’s original paper in the light of the discussion in the previous sections. We begin with a summary of Krippendorff’s recommendations about measuring reliability (Krippendorff, 2004a, Chapter 11), then discuss how coefficients of agreement have been used in CL to measure...
Article
Full-text available
Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on t...
Conference Paper
Full-text available
One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts. In this paper we investigate the extent to which the contextualised embeddings of word forms that display multiplicity of sense reflect traditional distinctions of polysemy and homonymy....
Preprint
Full-text available
One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts. In this paper we investigate the extent to which the contextualised embeddings of word forms that display multiplicity of sense reflect traditional distinctions of polysemy and homonymy....
Preprint
Full-text available
Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature. Thus, the biomedical genre has long been the second most researched genre for coreference resolution after the news domain, and the subject of a great deal of research for NLP in general. In recent years...
Preprint
Full-text available
In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and many others, unrealized (null) arguments in certain syntactic positions can refer to a previously introduced entity, and are thus called anaphoric zero pronouns. The existing resources for studying anaphoric zero pronoun interpretation are however still limited. In this pape...
Conference Paper
Full-text available
The state-of-the-art on basic, single-antecedent anaphora has greatly improved in recent years. Researchers have therefore started to pay more attention to more complex cases of anaphora such as split-antecedent anaphora, as in “Time-Warner is considering a legal challenge to Telecommunications Inc’s plan to buy half of Showtime Networks Inc–a move...
Preprint
Full-text available
The state-of-the-art on basic, single-antecedent anaphora has greatly improved in recent years. Researchers have therefore started to pay more attention to more complex cases of anaphora such as split-antecedent anaphora, as in Time-Warner is considering a legal challenge to Telecommunications Inc's plan to buy half of Showtime Networks Inc-a move...
Article
Full-text available
Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained...
Conference Paper
Full-text available
Now that the performance of coreference resolvers on the simpler forms of anaphoric reference has greatly improved, more attention is devoted to more complex aspects of anaphora. One limitation of virtually all coreference resolution models is the focus on single-antecedent anaphors. Plural anaphors with multiple antecedents-so-called split-anteced...
Conference Paper
Full-text available
This paper explores the impact on language proficiency of comprehensible output applied in computer assisted language learning (CALL). Targeting speakers of intermediate level, we adapted a visually-grounded dialogue task, optimizing for language acquisition. The task was implemented as a mobile application where learners are organized in pairs and...
Chapter
Ambiguity is one of the most important aspects of natural language, but one of the most difficult to define and distinguish from related (and unrelated) phenomena. In this chapter, ambiguity is first differentiated from other sources of indeterminacy, such as unspecificity, and other sources of indefiniteness, such as vagueness. Different types of...
Preprint
Full-text available
Now that the performance of coreference resolvers on the simpler forms of anaphoric reference has greatly improved, more attention is devoted to more complex aspects of anaphora. One limitation of virtually all coreference resolution models is the focus on single-antecedent anaphors. Plural anaphors with multiple antecedents-so-called split-anteced...
Preprint
Full-text available
No neural coreference resolver for Arabic exists, in fact we are not aware of any learning-based coreference resolver for Arabic since (Bjorkelund and Kuhn, 2014). In this paper, we introduce a coreference resolution system for Arabic based on Lee et al's end to end architecture combined with the Arabic version of bert and an external mention detec...
Conference Paper
Full-text available
Homonymy is often used to showcase one of the advantages of context-sensitive word embedding techniques such as ELMo and BERT. In this paper we want to shift the focus to the related but less exhaustively explored phenomenon of polysemy, where a word expresses various distinct but related senses in different contexts. Specifically, we aim to i) inv...
Article
Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels generated from crowd annotations for training a computer vision model, showing that using such labels maximizes performance of the models over unseen data. In this paper, we generalize these results by showing that training with soft labels is an effectiv...
Conference Paper
Full-text available
When evaluating model performance on automated annotation tasks such as anaphora resolution and specifically pronoun resolution, the gold standards often postulate a single correct referent for each referring expression. Previous research on annotator disagreement however found that in some cases there might not actually be a single correct referen...
Preprint
Full-text available
Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in [Bank of [China]] (Finkel and Manning, 2009). In this paper, we...
Chapter
Full-text available
As the use of Games-With-A-Purpose (GWAPs) broadens, their annotation schemes have increased in complexity. The types of annotations required within NLP are an example of labelling that can involve varying complexity of annotations. Assigning more complex tasks to more skilled players through a progression mechanism can achieve higher accuracy in t...
Preprint
Full-text available
We propose a multi task learning-based neural model for bridging reference resolution tackling two key challenges faced by bridging reference resolution. The first challenge is the lack of large corpora annotated with bridging references. To address this, we use multi-task learning to help bridging reference resolution with coreference resolution....
Article
Effective information management has long been a problem in organisations that are not of a scale that they can afford their own department dedicated to this task. Growing information overload has made this problem even more pronounced. On the other hand we have recently witnessed the emergence of intelligent tools, packages and resources that made...
Article
Full-text available
This article has been withdrawn at the request of the author(s). The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at https://www.elsevier.com/about/our-business/policies/article-withdrawal.
Preprint
Full-text available
Anaphora resolution (coreference) systems designed for the CONLL 2012 dataset typically cannot handle key aspects of the full anaphora resolution task such as the identification of singletons and of certain types of non-referring expressions (e.g., expletives), as these aspects are not annotated in that corpus. However, the recently released datase...
Article
Within traditional games design, incorporating progressive difficulty is considered of fundamental importance. But despite the widespread intuition that progression could have clear benefits in Games-With-A-Purpose (GWAPs)–e.g., for training non-expert annotators to produce more complex judgements– progression is not in fact a prominent feature of...
Preprint
Full-text available
We present an automated evaluation method to measure fluidity in conversational dialogue systems. The method combines various state of the art Natural Language tools into a classifier, and human ratings on these dialogues to train an automated judgment model. Our experiments show that the results are an improvement on existing metrics for measuring...
Conference Paper
Full-text available
We argue that the mechanics of 'Ville type Free-To-Play (F2P) games in general, and incremental games in particular, is especially suited for Games-With-A-Purpose. We demonstrate this through WordClicker, an incremental game whose mechanics is designed for text labelling. We believe the design and mechanics used are highly transferable to other gam...
Conference Paper
Full-text available
In this paper we present Wormingo, ¹ a new Game-with-a-Purpose for anaphoric annotation. It introduces the motivation-annotation paradigm which uses linguistic puzzles and other widely known gamification techniques and word game mechanics to motivate players to carry out anaphoric annotation tasks. In a preliminary experiment, the game was tested o...
Conference Paper
Full-text available
In this paper we present WordClicker, a clicker game for text annotation. We believe the mechanics of 'Ville type Free-To-Play (F2P) games in general, and clicker games in particular, is particularly suited for GWAPs (Games-With-A-Purpose). WordClicker was developed as one component of a suite of GWAPs meant to cover all aspects of language interpr...
Preprint
Full-text available
Mention detection is an important aspect of the annotation task and interpretation process for applications such as coreference resolution. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state-of-the-art coreference resolution system; th...
Conference Paper
Full-text available
The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additi...