Marco Passarotti

Marco Passarotti
  • Università Cattolica del Sacro Cuore

About

111
Publications
12,147
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
503
Citations
Introduction
Skills and Expertise
Current institution

Publications

Publications (111)
Article
Full-text available
During the recent years, an always growing number of linguistic resources and automatic systems for sentiment analysis have been developed covering a wide range of languages. However, research in this field is still not much explored for texts written in Classical languages. Working on such languages means dealing with peculiar textual genres such...
Presentation
Full-text available
Although the main applications of resources and tools for sentiment analysis typically fall in fields such as social media and customer experience monitoring, there is an increasing interest in extending their range to texts written in ancient and historical languages. Such interest mirrors the substantial growth of the area dedicated to building a...
Conference Paper
Full-text available
Questo articolo presenta il corpus EvaLatin 1.0, sviluppato per la pri-ma campagna di valutazione di strumenti di Trattamento Automatico del Linguaggio per il latino. La campagna si è concentrata su due analisi lingui-stiche, ovvero la lemmatizzazione e l'annotazione delle parti del discorso. Particolare attenzione è stata rivolta alla costruzione...
Presentation
Full-text available
Nel presente contributo vogliamo analizzare una raccolta di dati tratti da Twitter e relativi a due argomenti: la politica e la musica. L’analisi si svolgerà su tre livelli. Prima di tutto, quantificheremo il numero di menzioni, hashtag, emoji per capirne l’impatto sulla globalità dei dati. Poi verrà preso un campione casuale di 100 tweet per argom...
Article
Full-text available
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant im...
Poster
Full-text available
This contribution presents the current status of the ERC project “LiLa: Linking Latin”, the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources link...
Poster
Full-text available
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
Conference Paper
Full-text available
This paper describes the organization and the results of the second edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The three shared tasks proposed in EvaLatin 2022, i. e. Lemmatization, Part-of-Speech Tagging and Features Identification, are aimed to foster research in the field of language tech...
Conference Paper
Full-text available
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci' using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
Conference Paper
Full-text available
This contribution presents the current status of the ERC project "LiLa: Linking Latin", the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources link...
Chapter
In this paper we present a set of annotated data and the results of a number of unsupervised experiments for the analysis of sentiment in Latin poetry. More specifically, we describe a small gold standard made of eight poems by Horace, in which each sentence is labeled manually for the sentiment using a four-value classification (positive, negative...
Chapter
The Liber Abbaci (13th century) is a milestone in the history of mathematics and accounting. Due to the late stage of Latin, its features and its very specialized content, it also represents a unique resource for scholars working on Latin corpora. In this paper we present the annotation and linking work carried out in the frame of the project Fibon...
Chapter
This paper describes the steps taken to include data from the Lewis & Short bilingual Latin-English dictionary into the Knowledge Base of linguistic resources for Latin LiLa. First, data were extracted from the original XML and matched with entries in LiLa, overcoming ambiguities and structural inconsistencies in the source. Subsequently, senses we...
Presentation
Full-text available
In our talk, we present the structure and the linguistic resources currently included in the LiLa Knowledge Base, i.e. a collection of multifarious textual and lexical resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the Linked Data paradigm. We also present a set of lemm...
Presentation
Full-text available
La presentazione descrive il corpus EvaLatin 1.0 che contiene i dati annotati di addestramento e valutazione rilasciati per la campagna di valutazione EvaLatin 2020.
Presentation
Full-text available
In this paper we present the methodology followed to extend a Latin sentiment lexicon (called LatinAffectus), the process of inclusion of the lexicon in a knowledge base of interoperable linguistic resources for Latin and one use case performed on the treebank of Dante Alighieri’s Latin works annotated following the Universal Dependencies guideline...
Chapter
Full-text available
This paper describes the steps taken to model a valency lexicon for Latin (Latin Vallex) according to the principles of the Linked Data paradigm, and to interlink its valency frames with the lexical senses recorded in a manually checked subset of the Latin WordNet. The valency lexicon and the WordNet share lexical entries and are part of the LiLa K...
Presentation
Full-text available
This paper presents the early stages of the development of a new treebank containing all of Dante Alighieri’s Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-anno...
Article
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. In addition, we release vectors pre-trained on the “O...
Article
L’articolo descrive il lavoro di realizzazione di UDante, il corpus dei testi latini di Dante Alighieri annotato a livello sintattico in base ai criteri stabiliti dall’iniziativa Universal Dependencies. Dopo avere introdotto e motivato lo stile di annotazione adottato, l’articolo presenta nel dettaglio le fasi di costruzione di UDante, soffermandos...
Conference Paper
Full-text available
This paper describes the addition of an index of 1, 763 Ancient Greek loan-words to the collection of Latin lemmas of the LiLa: Linking Latin Knowledge Base of interoperable linguistic resources. This lexical resource increases LiLa's lemma count and tunes its underlying data model to etymological borrowing.
Conference Paper
Full-text available
This paper presents the early stages of the development of a new tree-bank containing all of Dante Alighieri's Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-ann...
Article
Full-text available
This paper presents the structure of the LiLa Knowledge Base, i.e. a collection of multifarious linguistic resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the so-called Linked Data paradigm. Following its highly lexically based nature, the core of the LiLa Knowledge Base...
Chapter
Over the last few decades, the widespread diffusion of digital technology has increased availability of primary textual sources, radically changing the everyday life of scholars in the humanities, who are now able to access, query and process a wealth of empirical evidence in ways not possible before. Also for ancient languages, corpora enhanced wi...
Conference Paper
Full-text available
In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called LatinAffectus, in a knowledge base of interoperable linguistic resources developed within the "LiLa: Linking Latin" project. More specifically, a manually-curated list of lemma-sentiment pairs is linked to a comprehensive collection of Latin lemm...
Presentation
Full-text available
Presentation for the 3rd Workshop on Humanities in the Semantic Web (WHiSe), co-located with the 15th Extended Semantic Web Conference (ESWC 2020) and held online on June 2, 2020. The paper has won the Best Paper Award of the workshop. ABSTRACT: In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called...
Article
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. In addition, we release vectors pre-trained on the “O...
Conference Paper
Full-text available
Sentiment lexicons are essential for developing automatic sentiment analysis systems, but the resources currently available mostly cover modern languages. Lexicons for ancient languages are few and not evaluated with high-quality gold standards. However, the study of attitudes and emotions in ancient texts is a growing field of research which poses...
Conference Paper
Full-text available
This paper describes the first edition of EvaLatin, a campaign totally devoted to the evaluation of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, are aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts t...
Chapter
This chapter presents an investigation on the productivity of words derived from -sc-verbs attested in Classical Latin, as a way of adding to the rich debate on their semantic significance. The work is performed through the use of empirical data provided by the Word Formation Latin lexicon, a derivational morphology resource for Classical and Late...
Chapter
The Word Formation Latin (WFL) project has been awarded a Marie Curie Individual Fellowship to create a language resource consisting of a derivational morphological lexicon of the Latin language, which connects lexical elements on the basis of word formation rules. In WFL, lexemes are segmented and analysed into their derivational morphological com...
Poster
Full-text available
This paper describes a preliminary expansion and assessment of the Latin WordNet for the purposes of the LiLa: Linking Latin project. The objective of this study is to better understand the implications of expanding and evaluating the sense coverage of the Latin WordNet, with a view to identifying the most effective method for its refinement and in...
Presentation
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the emb...
Conference Paper
Full-text available
This paper describes a preliminary expansion and assessment of the Latin WordNet for the purposes of the LiLa: Linking Latin project. The objective of this study is to better understand the implications of expanding and evaluating the sense coverage of the Latin WordNet, with a view to identifying the most effective method for its refinement and in...
Conference Paper
Full-text available
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the em...
Chapter
Full-text available
The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available,...
Conference Paper
Full-text available
The LiLa: Linking Latin project was recently awarded funding from the European Research Council to build a Knowledge Base of linguistic resources for Latin. LiLa responds to the growing need in the fields of Computational Linguistics, Humanities Computing and Classics to create an interoperable ecosystem of resources and Natural Language Processing...
Book
This book gathers, and makes available in English, with new introductions, previously out of print or otherwise difficult to access articles by Fr Roberto Busa S.J. (1913 - 2011). Also included is a comprehensive bibliography of Busa, an oral history interview with Busa's translator, and a substantial new chapter that evaluates Busa's contributions...
Conference Paper
Full-text available
This article describes a computational text reuse study on Latin texts designed to evaluate the performance of TRA-CER, a language-agnostic text reuse detection engine. As a case study, we use the Index Thomisticus as a gold standard to measure the performance of the tool in identifying text reuse between Thomas Aquinas' Summa contra Gentiles and h...
Conference Paper
Full-text available
This paper describes the changes applied to the original process used to convert the Index Thomisticus Treebank, a corpus including texts in Medieval Latin by Thomas Aquinas, into the annotation style of Universal Dependencies. The changes are made both to harmonise the Universal Dependencies version of the Index Thomisticus Treebank with the two o...
Article
Full-text available
In the context of the Index Thomisticus Treebank project, we have enhanced the full text of Bellum Catilinae by Sallust with semantic annotation. The annotation style resembles the one used for the so called “tectogrammatical” layer of the Prague Dependency Treebank. By exploiting the results of semantic role labeling, ellipsis resolution and coref...
Chapter
Full-text available
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Chapter
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference seri...
Conference Paper
Full-text available
English. We present a paradigm-based inflected lexicon of Latin verbs built to provide empirical evidence supporting an entropy-based estimation of the degree of uncertainty in inflectional paradigms. The lexicon contains information on the inflected forms that occupy the 254 morphologically possible paradigm cells of 3,348 verbal lexemes extracted...
Poster
Full-text available
This paper aims at examining the diachronic distribution of one of the richest classes of nouns in Latin, namely those ending in-io. The work is performed through the combined use of a morphological analyser for Latin (Lemlat), and a database collecting all word forms occurring through different periods of Latin language (TF-CILF).
Book
Full-text available
Recent years have seen a growing interest in research aimed at building new linguistic resources and Natural Language Processing (NLP) tools for derivational morphology. The current increased interest in both the theoretical and applicative aspects of word formation is strictly connected to the large need for automatic semantic processing of lingui...
Conference Paper
The recent enhancement of the morphological analyser for Latin Lemlat with a large Onomasticon enables us to analyse both the morphology and the distribution of loanwords in the Latin lexicon. In this paper, first we describe the categories of proper names that were not possible to insert into Lemlat automatically, showing that a large part of them...
Conference Paper
This paper investigates the distribution of word formation data through network visualisation, as an entry point for the exploration / analysis of productivity in affixal derivation in Classical Latin. This study uses data from the Word Formation Latin lexicon, a derivational morphology resource for Latin, where entries are analysed into their form...
Conference Paper
This paper describes how to turn a Latin dependency treebank into queryable information so that it can be browsed online using a tree query engine and its web interface. The annotation layers of the treebank are first introduced, then the query system architecture is detailed, and finally the way the treebank is converted into a relational database...
Conference Paper
This paper introduces the main components of the downloadable package of the 3.0 version of the morphological analyser for Latin Lemlat. The processes of word form analysis and treatment of spelling variation performed by the tool are detailed, as well as the different output formats and the connection of the results with a recently built resource...
Chapter
La collana pubblica gli atti del convegno annuale di Linguistica Computazionale (CLiC-it), che ha lo scopo di costituire un luogo di discussione di riferimento nel campo delle ricerce sulla linguistica computazionale. Gli atti includono interventi sul trattamento automatico della lingua, comprendenti le riflessioni teoriche e metodologiche sul tema...
Article
Full-text available
ifferent lexical resources may pursue different views on lexical meaning. However, all of them deal with lexical items as common basic components, which are described according to criteria that may vary from one resource to another. In this paper, we present a method for measuring the degree of similarity between a valency-based lexical resource an...
Article
Full-text available
The paper evaluates the differences between two currently leading annotation schemes for dependency treebanks. By relying on four treebanks, we demonstrate that the treatment of conjunctions and adpositions represents the core difference between the two schemes and that this impacts the topological properties of the linguistic networks induced from...
Article
In Ancient Greek, as well as in other languages, whenever agreement is triggered by two or more coordinated phrases, two different constructions are allowed: either the agreement can be controlled by the coordinated phrase as a whole, or it can be triggered by just one of the coordinated words. In spite of the amount of information that can be read...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Chapter
The annual conference CLIC–it (''Italian Conference on Computational Linguistics'') is an initiative of the ''Italian Association of Computational Linguistics'' (AILC – www.ai-lc.it) which is intended to meet the need for a national and international forum for the promotion and dissemination of high-level original research in the field of Computati...
Article
Network theory provides a suitable framework to model the structure of languages as complex systems with deep relations between their components. Based on a network automatically built from a Latin dependency treebank that includes works of Thomas Aquinas, this paper applies methods for network analysis to show the key role of the verb sum (to be)...
Chapter
Full-text available
CLiC-it 2015 is held in Trento on December 3-4 2015, hosted and locally organized by Fondazione Bruno Kessler (FBK), one the most important Italian research centers for what concerns CL. The organization of the conference is the result of a fruitful conjoint effort of different research groups (Università di Torino, Università di Roma Tor Vergata a...
Book
We apply word hierarchical clustering techniques to collect the occurrences of the lemma forma that show a similar contextual behaviour in the works of Thomas Aquinas into the same or closely related groups. Our results will support the lexicographers of a data-driven new lexicon of Thomas Aquinas in their task of writing the lexical entry of forma...
Conference Paper
Full-text available
Assuming that collaboration between theoretical and computational linguistics is essential in projects aimed at developing language resources like annotated corpora, this paper presents the first steps of the semantic annotation of the Index Thomisticus Treebank, a dependency-based treebank of Medieval Latin. The semantic layer of annotation of the...
Book
We present a lexical-based investigation into the corpus of the opera omnia of Seneca. By applying a number of statistical techniques to textual data we aim to automatically collect similar texts into closely related groups. We demonstrate that our objective and unsupervised method is able to distinguish the texts by work and genre.
Article
Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), i-ii. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronical...
Article
Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), iii-iv. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronic...
Article
Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electro...
Article
Full-text available
We present an overview of the Index Thomisticus Treebank project (IT-TB). The IT- TB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11- million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data...
Conference Paper
Full-text available
We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Tree- bank, a syntactically annotated corpus of Me- dieval Latin texts by Thomas Aquinas. In our corpus-based approach, the lexicon re- flects the empirical evidence of the source data. Verbal arguments are induced directly from annotated data. The lexicon contains 43...

Network

Cited By