Tommaso Caselli

Tommaso Caselli
University of Groningen | RUG · Center for Language and Cognition Groningen (CLCG)

Doctor of Philosophy

About

110
Publications
16,590
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,365
Citations

Publications

Publications (110)
Preprint
Full-text available
Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as...
Article
This paper contributes to ongoing scholarly debates on the merits and limitations of computational legal text analysis by reflecting on the results of a research project documenting exceptional COVID‐19 management measures in Europe. The variety of exceptional measures adopted in countries characterized by different legal systems and natural langua...
Preprint
Full-text available
Biographical event detection is a relevant task for the exploration and comparison of the ways in which people's lives are told and represented. In this sense, it may support several applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed f...
Article
Full-text available
Studies on the applicability of heterogeneous semantically interoperable corpora are rare. We investigate to what extent reusability (both of systems and of annotations) is entailed by corpora whose interoperability is based on compliance to standards. In particular, we look at event detection in English texts, supported by the ISO-TimeML annotatio...
Article
Full-text available
Microaggressions are subtle manifestations of bias (Breitfeller et al. 2019). These demonstrations of bias can often be classified as a subset of abusive language. However, not much focus has been placed on the recognition of these instances. As a result, limited data is available on the topic, and only in English. Being able to detect microaggress...
Preprint
Full-text available
The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal se...
Preprint
Full-text available
Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic rese...
Article
Full-text available
Abstraction enables us to categorize experience, learn new information, and form judgments. Language arguably plays a crucial role in abstraction, providing us with words that vary in specificity (e.g., highly generic: tool vs. highly specific: muffler). Yet, human-generated ratings of word specificity are virtually absent. We hereby present a data...
Conference Paper
Full-text available
Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Ther...
Article
Full-text available
Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of t...
Chapter
Connotation is a dimension of lexical meaning at the semantic-pragmatic interface. Connotations can be used to express point of views, perspectives, and implied emotional associations. Variations in connotations of the same lexical item can occur at different level of analysis: from individuals, to community of speech, specific domains, and even ti...
Preprint
Full-text available
Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Ther...
Conference Paper
Full-text available
This paper proposes a methodology for investigating populism by analyzing proto-slogans, nominal utterances (NUs) typical of a political community on social media. We extracted more than 700. This paper proposes a methodology for investigating populism by analyzing proto-slogans, nominal utterances (NUs) typical of a political community on social m...
Preprint
Full-text available
This paper describes the TOKOFOU system, an ensemble model for misinformation detection tasks based on six different transformer-based pre-trained encoders, implemented in the context of the COVID-19 Infodemic Shared Task for English. We fine tune each model on each of the task's questions and aggregate their prediction scores using a majority voti...
Poster
Full-text available
In this contribution we present the Diachronic News and Travel (DNT) corpus, a collection of Englishtexts covering different text genres and two broad time periods. More precisely, the corpus is made up of both contemporary and historical texts of three genres, namely newspaper articles, travel reports and tourist guides. The corpus, freely availab...
Article
Full-text available
Previous studies indicate that the capacity of media to influence the salience of issues in the public realm is strongly dependent on specific attributes that characterize these issues. In this work, we investigate two internal aspects of issue types related to the attribute of duration. First, we address whether news stories belonging to different...
Conference Paper
Full-text available
The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages. HaSpeeDe 2 is composed of a Main task (hate speech detection) and two Pilot tasks, (stereotype and nominal utterance detection). Systems were challenged along two dimensions: (i) time, with test data co...
Preprint
Full-text available
In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed co...
Chapter
2019 has been characterized by worldwide waves of protests. Each country’s protests is different but there appear to be common factors. In this paper we present two approaches for identifying protest events in news in English. Our goal is to provide political science and discourse analysis scholars with tools that may facilitate the understanding o...
Article
Full-text available
Conceptual concreteness and categorical specificity are two continuous variables that allow distinguishing, for example, justice (low concreteness) from banana (high concreteness) and furniture (low specificity) from rocking chair (high specificity). The relation between these two variables is unclear, with some scholars suggesting that they might...
Conference Paper
Full-text available
Abusive language detection is an unsolved and challenging problem for the NLP community. Recent literature suggests various approaches to distinguish between different language phenomena (e.g., hate speech vs. cyberbullying vs. offensive language) and factors (degree of explicitness and target) that may help to classify different abusive language p...
Preprint
Full-text available
The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based...
Article
Full-text available
Discussion on social media over controversial topics can easily escalate to harsh interactions. Being able to predict whether a certain post will be controversial, and what reactions it might give rise to, could help moderators provide a better experience for all users. We develop a battery of distant supervised models that use Facebook reactions a...
Preprint
Full-text available
This paper reports on a set of experiments with different word embeddings to initialize a state-of-the-art Bi-LSTM-CRF network for event detection and classification in Italian, following the EVENTI evaluation exercise. The net- work obtains a new state-of-the-art result by improving the F1 score for detection of 1.3 points, and of 6.5 points for c...
Article
Full-text available
Will reading different stories about the same event in the world result in a similar image of the world? Will reading the same story by different people result in a similar proxy for experiencing the story? The answer to both questions is no because language is abstract by definition and relies on our episodic experience to turn a story into a more...
Conference Paper
Full-text available
In this paper, we describe the Circumstantial Event Ontology (CEO), a newly developed ontology for calamity events that models semantic circumstantial relations between event classes, where we define circumstantial as inferred implicit causal relations. The circumstantial relations are inferred from the assertions of the event classes that involve...
Conference Paper
Full-text available
In this article we review Temporal Processing systems that participated in the TempEval-3 task as a basis to develop our own system, that we also present and release. The system incorporates high level lexical semantic features, obtaining the best scores for event detection (F1-Class 72.24) and second best result for temporal relation classificatio...
Chapter
Full-text available
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
Sources, in the form of selected Facebook pages, can be used as indicators of hate-rich content. Polarized distributed representations created over such content prove superior to generic embeddings in the task of hate speech detection. The same content seems to carry a too weak signal to proxy silver labels in a distant supervised setting. However,...
Chapter
Full-text available
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
Chapter
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
Chapter
Full-text available
EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent ma...
Chapter
This chapter presents the language specific adaptation of the TimeML annotation scheme to Italian and the creation of the Ita-TimeBank, a language resource composed of two corpora manually annotated with temporal and event information. Particular attention is given to the methodology followed in the development of the corpora: the annotation guidel...
Poster
Full-text available
This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset, available online, are describe...
Chapter
Different events and their reception in different reader communities may give rise to controversy. We propose a distant supervised entropy-based model that uses Facebook reactions as proxies for predicting news controversy. We prove the validity of this approach by running within- and across-source experiments, where different news sources are conc...
Conference Paper
Full-text available
This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtask...
Conference Paper
Full-text available
In this paper we present PIERINO (PIattaforma per l'Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens' comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Chapter
EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for the Italian language: since 2007 shared tasks have been proposed covering the analysis of both written and spoken language with the aim of enhancing the development and dissemination of resources and technologies for Italian. EVALITA is an initiative of the Itali...
Conference Paper
Full-text available
The paper proposes a new evaluation exercise, meant to shed light on the syntax-semantics interface for the analysis of written Italian and resulting from the combination of the EVALITA 2014 dependency parsing and event extraction tasks. It aims at investigating the cross-fertilization of tasks, generating a new resource combining dependency and ev...
Conference Paper
Full-text available
English. In this paper we describe FacTA, a new task connecting the evaluation of factuality profiling and temporal anchoring, two strictly related aspects in event processing. The proposed task aims at providing a complete evaluation framework for factual-ity profiling, at taking the first steps in the direction of narrative container evaluation f...
Article
Full-text available
Shared task evaluation campaigns represent a well established form of competitive evaluation, an important opportunity to propose and tackle new challenges for a specific research area and a way to foster the development of benchmarks, tools and resources. The advantages of this approach are evident in any experimental field, including the area of...
Conference Paper
Full-text available
In this paper, we present a rich contex-tual perspective on the lexicon and back-ground knowledge for the purpose of deep semantic parsing. In the project Under-standing Language By machine 1 , we ad-dress various aspects of semantics in rela-tion to i.) reference to entities and event in-stances, ii.) modeling of author and reader perspectives. Le...
Article
Full-text available
This work presents a proposal for the development of a natural language processing module for event and temporal analysis of biographies as available in Wikipedia. At the current level of development, we restricted the extraction to temporally anchored events as they represent salient information which can be further used to extract additional even...
Article
Full-text available
We present an analysis of a high-level semantic task, the construction of cross-document event timelines from SemEval 2015 Task 4: TimeLine, to trace down errors to the components of our pipeline system. Event timeline extraction requires many different Natural Language Processing tasks among which entity and event detection, coreference resolution...
Chapter
Full-text available
CLiC-it 2015 is held in Trento on December 3-4 2015, hosted and locally organized by Fondazione Bruno Kessler (FBK), one the most important Italian research centers for what concerns CL. The organization of the conference is the result of a fruitful conjoint effort of different research groups (Università di Torino, Università di Roma Tor Vergata a...
Conference Paper
Full-text available
In this work we present a methodology for the annotation of Attri-bution Relations (ARs) in speech which we apply to create a pilot corpus of spo-ken informal dialogues. This represents the first step towards the creation of a re-source for the analysis of ARs in speech and the development of automatic extrac-tion systems. Despite its relevance for...
Conference Paper
Full-text available
This report describes the EVENTI (EValuation of Events aNd Temporal Information) task organized within the EVALITA 2014 evaluation campaign. The EVENTI task aims at evaluating the performance of Temporal Information Processing systems on a corpus of Italian news articles. Motivations for the task, datasets, evaluation metrics, and results obtained...
Conference Paper
Full-text available
This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for iden...
Conference Paper
Full-text available
Recently, the task of measuring semantic similarity between given texts has drawn much attention from the Natural Language Processing community. Especially, the task becomes more interesting when it comes to measuring the semantic similarity between different-sized texts, e.g paragraph-sentence, sentence-phrase, phrase-word, etc. In this paper, we,...
Conference Paper
This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for iden...