Rachele Sprugnoli

Rachele Sprugnoli
Università di Parma | UNIPR · Dipartimento di Lettere, Arti, Storia e Società

Phd

About

107
Publications
10,540
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
726
Citations
Citations since 2016
73 Research Items
572 Citations
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
Introduction
Researcher (RTDA) at the University of Parma, part of the board of AIUCD (the Italian association for the Digital Humanities) and social media manager of AILC (the Italian association of computational linguistics). My research interests are: creation of linguistic resources, semantic annotation of texts, automatic processing of temporal information, sentiment analysis, digital humanities.

Publications

Publications (107)
Presentation
Full-text available
Nel presente contributo vogliamo analizzare una raccolta di dati tratti da Twitter e relativi a due argomenti: la politica e la musica. L’analisi si svolgerà su tre livelli. Prima di tutto, quantificheremo il numero di menzioni, hashtag, emoji per capirne l’impatto sulla globalità dei dati. Poi verrà preso un campione casuale di 100 tweet per argom...
Poster
Full-text available
This contribution presents the current status of the ERC project “LiLa: Linking Latin”, the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources link...
Poster
Full-text available
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
Presentation
Full-text available
Il workshop è parte del ciclo "Metodi e strumenti per gli umanisti digitali" organizzato da Bembus, promosso da Il Liutaio nel Bazaar e finanziato con i fondi per le attività studentesche dell'Università Ca' Foscari Venezia. A cura di: Marco Sartor (Università di Parma). Organizzazione: Mara Caron.
Conference Paper
Full-text available
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci' using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
Conference Paper
Full-text available
This contribution presents the current status of the ERC project "LiLa: Linking Latin", the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources link...
Presentation
Full-text available
In our talk, we present the structure and the linguistic resources currently included in the LiLa Knowledge Base, i.e. a collection of multifarious textual and lexical resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the Linked Data paradigm. We also present a set of lemm...
Presentation
Full-text available
Presentation at the Digital Dante Days, a two-day international symposium on the past, present and future of digital scholarship on Dante’s work, organized by "Dipartimento di Studi Umanistici", VEDPH, Ca' Foscari University.
Presentation
Full-text available
La presentazione descrive il corpus EvaLatin 1.0 che contiene i dati annotati di addestramento e valutazione rilasciati per la campagna di valutazione EvaLatin 2020.
Presentation
Full-text available
In this paper we present the methodology followed to extend a Latin sentiment lexicon (called LatinAffectus), the process of inclusion of the lexicon in a knowledge base of interoperable linguistic resources for Latin and one use case performed on the treebank of Dante Alighieri’s Latin works annotated following the Universal Dependencies guideline...
Presentation
Full-text available
Presentation given for the PhD course "Emotion-oriented systems" at the University of Turin. Content updated with respect to the previous presentation entitled "Sentiment Analysis for Latin: a Journey from Seneca to Thomas Aquinas".
Presentation
Full-text available
This paper presents a new linguistic resource for Italian, called MultiEmotions-It, containing comments to music videos and advertisements posted on YouTube and Facebook. These comments are manually annotated according to four different dimensions: i.e., relatedness, opinion polarity, emotions and sarcasm. For the annotation of emotions we adopted...
Presentation
Full-text available
This paper presents the early stages of the development of a new treebank containing all of Dante Alighieri’s Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-anno...
Article
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. In addition, we release vectors pre-trained on the “O...
Presentation
Full-text available
While the main applications of resources and tools for sentiment analysis typically fall within the scope of fields like customer experience and social media monitoring, there is an increasing interest in extending their range to texts written in ancient and historical languages. Such interest mirrors the substantial growth of the area dedicated to...
Poster
Full-text available
In this contribution we present the Diachronic News and Travel (DNT) corpus, a collection of Englishtexts covering different text genres and two broad time periods. More precisely, the corpus is made up of both contemporary and historical texts of three genres, namely newspaper articles, travel reports and tourist guides. The corpus, freely availab...
Conference Paper
Full-text available
In this paper we introduce the DaDoEval shared task at EVALITA 2020, aimed at automatically assigning temporal information to documents written in Italian. The evaluation exercise comprises three levels of temporal granularity, from coarse-grained to year-based, and includes two types of test sets, either having the same genre of the training set,...
Conference Paper
Full-text available
This paper presents the early stages of the development of a new tree-bank containing all of Dante Alighieri's Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-ann...
Conference Paper
Full-text available
This paper presents a new linguistic resource for Italian, called MultiEmotions-It, containing comments to music videos and advertisements posted on YouTube and Facebook. These comments are manually annotated according to four different dimensions: i.e., relatedness, opinion polarity, emotions and sarcasm. For the annotation of emotions we adopted...
Conference Paper
Full-text available
In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called LatinAffectus, in a knowledge base of interoperable linguistic resources developed within the "LiLa: Linking Latin" project. More specifically, a manually-curated list of lemma-sentiment pairs is linked to a comprehensive collection of Latin lemm...
Presentation
Full-text available
Presentation for the 3rd Workshop on Humanities in the Semantic Web (WHiSe), co-located with the 15th Extended Semantic Web Conference (ESWC 2020) and held online on June 2, 2020. The paper has won the Best Paper Award of the workshop. ABSTRACT: In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called...
Conference Paper
Full-text available
Sentiment lexicons are essential for developing automatic sentiment analysis systems, but the resources currently available mostly cover modern languages. Lexicons for ancient languages are few and not evaluated with high-quality gold standards. However, the study of attitudes and emotions in ancient texts is a growing field of research which poses...
Conference Paper
Full-text available
This paper describes the first edition of EvaLatin, a campaign totally devoted to the evaluation of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, are aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts t...
Chapter
Full-text available
Per riconoscere i tratti linguistici di interesse su un corpus composto da quasi tremila temi e per annotarli in modo coerente si è reso necessario lo sviluppo di diversi strumenti informatici. Tali software appartengono a due tipologie: da un lato, si sono sviluppati alcuni moduli per l'analisi del testo, che in modo automatico riconoscono dei tra...
Poster
Full-text available
We present an ongoing project aimed at creating the National Edition of Alcide De Gasperi’s letters in digital format. Our main goal is to systematically collect and transcribe a large number of private and public letters, present in different archives, written or received by De Gasperi throughout his life, and to shed light into all the critical s...
Conference Paper
Full-text available
We present an ongoing project aimed at creating the National Edition of Alcide De Gasperi's letters in digital format. Our main goal is to systematically collect and transcribe a large number of private and public letters, present in different archives, written or received by De Gasperi throughout his life, and to shed light into all the critical s...
Presentation
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the emb...
Poster
Full-text available
In this paper we present a multigenre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi’s public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and parl...
Presentation
Full-text available
Research communication at CLiC-it 2019, presenting the following paper: Sprugnoli, R., & Tonelli, S. (2019). Novel Event Detection and Classification for Historical Texts. Computational Linguistics, 45(2), 229-265.
Conference Paper
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the em...
Conference Paper
Full-text available
In this paper we present a multi-genre corpus spanning 50 years of European history. It contains a comprehensive collection of Alcide De Gasperi's public documents, 2,762 in total, written or transcribed between 1901 and 1954. The corpus comprises different types of texts, including newspaper articles, propaganda documents, official letters and par...
Poster
Full-text available
In the last years, word embeddings have become important resources to deal with many Natural Language Processing tasks. Several pre-trained word vectors have been released starting from huge amount of contemporary texts. The interest towards this type of distributional approach has recently emerged also in the Digital Humanities community with stud...
Article
Full-text available
This article proposes the first comparative study of four years of Italian conferences in the fields of Digital Humanities and Computational Linguistics. More specifically, we created a corpus consisting of the contributions presented in the AIUCD and CLiC-it conferences between 2014 and 2017 to which we applied a multidimensional analysis taking i...
Article
Full-text available
Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of...
Article
Full-text available
In this work, we present LOD Navigator, a data visualisation and exploration tool to track the lives and trajectories of Italian Shoah Victims. We take advantage of the work done at the Contemporary Jewish Documentation Center in Milan (CDEC), leading to the publication of a database of Linked Open Data (LOD) containing information about the life a...
Presentation
Full-text available
In this proposal we describe the results of a project aiming at tracing the movements of Trentino people that were deported to the 3rd Reich camps during World War II. More specifically, we performed the semantic annotation, georeferencing and visualization of data collected by expert historians. This work wants to shed light on the stories of peop...
Poster
Full-text available
In this abstract we propose a method for creating co-word networks starting from keyphrases automatically extracted from the full text of scientific papers. This approach aims to overcome the limitations of the methodologies traditionally used in bibliometrics with the final goal of identifying the themes considered crucial in a research field and...
Presentation
Full-text available
This paper presents the application of a neural architecture to the identification of place names in English historical texts. We test the impact of different word embeddings and we compare the results to the ones obtained with the Stanford NER module of CoreNLP before and after the retraining using a novel corpus of manually annotated historical t...
Conference Paper
Full-text available
This paper presents the application of a neural architecture to the identification of place names in English historical texts. We test the impact of different word embeddings and we compare the results to the ones obtained with the Stanford NER module of CoreNLP before and after the retraining using a novel corpus of manually annotated historical t...
Conference Paper
Full-text available
We present a project aimed at studying the evolution of students' writing skills in a temporal span of 15 years (from 2001 to 2016), analysing in particular the impact of neo-standard Italian. More than 2,500 essays have been transcribed and annotated by teachers according to 28 different linguistic traits. We present here the annotation process to...
Conference Paper
Full-text available
This paper reports on the systems the InriaFBK Team submitted to the EVALITA 2018-Shared Task on Hate Speech Detection in Italian Twitter and Facebook posts (HaSpeeDe). Our submissions were based on three separate classes of models: a model using a recurrent layer, an ngram-based neural network and a LinearSVC. For the Facebook task and the two cro...
Conference Paper
Full-text available
Although WhatsApp is used by teenagers as one major channel of cyberbullying, such interactions remain invisible due to the app privacy policies that do not allow ex-post data collection. Indeed, most of the information on these phenomena rely on surveys regarding self-reported data. In order to overcome this limitation, we describe in this paper t...
Conference Paper
Full-text available
In this paper, we describe two systems for predicting message-level offensive language in German tweets: one discriminates between offensive and not offensive messages, and the second performs a fine-grained classification by recognizing also classes of offense. Both systems are based on the same approach, which builds upon Recurrent Neural Network...
Chapter
This paper presents the De Gasperi corpus, a freely available linguistic resource for Italian annotated with temporal information at different levels: i.e., events, temporal expressions, temporal signals and temporal relations. The De Gasperi corpus has been employed to understand how well systems built for contemporary Italian perform on historica...
Conference Paper
Full-text available
This paper results from observations that have been made while studying ontological and linked data-based approaches to the encoding of biographical data. Based on certain issues we discovered and which will be described here, we aim to call for a collaborative work towards guidelines for modelling biographical data in the standard Semantic Web rep...
Poster
Full-text available
The digitization of epistolaries is extremely important for the preservation and study of the cultural and historical patrimony of literary correspondence. In recent years, several small and large-scale projects have been carried out. Many of these initiatives are based on collaborative work adopting a crowdsourcing approach and using web-based tra...
Poster
Full-text available
Non-fictional travel writings are powerful sources of information for many research areas, such as art history, ethnography, geography and cultural history. By collecting several books on the same place, it is possible to study how material and cultural aspects change over time. Moreover, travel writings can give insight not only into the places an...
Preprint
Non-fictional travel writings are powerful sources of information for many research areas, such as art history, ethnography, geography and cultural history. By collecting several books on the same place, it is possible to study how material and cultural aspects change over time. Moreover, travel writings can give insight not only into the places an...
Conference Paper
Full-text available
Code-mixing is the alternation between two or more languages in the same text. This phenomenon is very relevant in the travel domain, since it can provide new insight in the way foreign cultures are perceived and described to the readers. In this paper, we analyse English-Italian code-mixing in historical English travel writings about Italy. We ret...
Chapter
This chapter presents the language specific adaptation of the TimeML annotation scheme to Italian and the creation of the Ita-TimeBank, a language resource composed of two corpora manually annotated with temporal and event information. Particular attention is given to the methodology followed in the development of the corpora: the annotation guidel...
Article
Full-text available
This paper describes the development of a multilingual and multigenre manually annotated speech dataset, freely available to the research community as ground truth for the evaluation of automatic transcription systems and spoken language translation systems. The dataset includes two video genres—television broadcast news and talk-shows—and covers F...
Article
Full-text available
EVALITA, the evaluation campaign of Natural Language Processing and Speech Tools for the Italian language, was organised for the fifth time in 2016. Six tasks, covering both re-reruns as well as completely new tasks, and an IBM-sponsored challenge, attracted a total of 34 submissions. An innovative aspect at this edition was the focus on social med...
Poster
Full-text available
We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of...
Poster
Full-text available
This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset, available online, are describe...
Article
Full-text available
In 2013 a collaboration was started between the Digital Humanities research unit and the Italian-German Historical institute at Fondazione Bruno Kessler, whose goal was to develop tools and strategies to give new insight into the public documents written by Alcide De Gasperi. Through the analysis of textual occurrences, semantic structures, and tem...
Conference Paper
Full-text available
This paper presents L-KD, a tool that relies on available linguistic and knowledge resources to perform keyphrase clustering and labelling. The aim of L-KD is to help finding and tracing themes in English and Italian text data, represented by groups of keyphrases and associated domains. We perform an evaluation of the top-ranked domains using the 2...
Conference Paper
Full-text available
This paper describes the design and reports the results of two questionnaires. The first of these questionnaires was created to collect information about the interest of industrial companies in the field of Italian text/speech analytics towards the evaluation campaign EVALITA; the second to gather comments and suggestions for the future of the eval...
Article
We present an overview of event definition and processing spanning 25 years of research in NLP. We first provide linguistic background to the notion of event, and then present past attempts to formalize this concept in annotation standards to foster the development of benchmarks for event extraction systems. This ranges from MUC-3 in 1991 to the Ti...
Article
The application of research practices and methodologies from the Information and Communication Technologies to Humanities studies is having a great impact on the way humanities research is being conducted. However, although many applications have been developed to automatically analyse document collections from the historical or the literary domain...
Conference Paper
Full-text available
Data visualisation has become one of the most relevant DH topics, due to the advent of Big Data in Humanities research practices, and to the need to make complex statistical analyses accessible to users without a technical background. Although several visualisation libraries, such as d3.js, are now freely available online and are relatively easy to...
Poster
Full-text available
A playful activity to be performed by pairs of users in order to collect similarity judgments about artworks Winner of "DH Awards 2015" in the category "Best use of DH for fun" CENTER FOR INFORMATION TECHNOLOGY DIGITAL HUMANITIES RESEARCH UNIT @DH_FBK
Conference Paper
Full-text available
This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtask...
Conference Paper
Full-text available
In this paper we present PIERINO (PIattaforma per l'Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens' comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-...
Conference Paper
Full-text available
This paper presents QUANDHO (QUestion ANswering Data for italian HistOry), an Italian question answering dataset created to cover a specific domain, i.e. the history of Italy in the first half of the XX century. The dataset includes questions manually classified and annotated with Lexical Answer Types, and a set of question-answer pairs. This resou...