Project

LiLa: Linking Latin

Goal: The LiLa: Linking Latin (2018-2023) project was awarded funding from the European Research Council (ERC) to build a knowledge base of linguistic resources for Latin. LiLa responds to the growing need in the fields of Computational Linguistics and Humanities Computing to create an interoperable ecosystem of NLP tools and resources for the automatic processing of Latin. To this end, LiLa makes use of Linked Open Data (LOD) practices and standards to connect words to distributed textual and lexical resources alike.

Project website: https://lila-erc.eu

Updates
0 new
8
Recommendations
0 new
1
Followers
0 new
11
Reads
0 new
125

Project log

Rachele Sprugnoli
added 2 research items
This contribution presents the current status of the ERC project “LiLa: Linking Latin”, the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources linked to the Knowledge Base and the ways in which it is possible to query and explore them.
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
Rachele Sprugnoli
added 2 research items
This contribution presents the first steps towards the analysis of Leonardo Fibonacci's Liber Abbaci' using computational linguistics methods. The work is currently carried out in the context of a joint research project between the Tuscany Region and the University of Pisa with the help of an interdisciplinary team.
This contribution presents the current status of the ERC project "LiLa: Linking Latin", the main objective of which is to connect and exploit the wealth of existing linguistic resources for Latin by making them interoperable, through the creation of a Knowledge Base following Linked Data standards. We describe the textual and lexical resources linked to the Knowledge Base and the ways in which it is possible to query and explore them.
Rachele Sprugnoli
added a research item
Presentation at the Digital Dante Days, a two-day international symposium on the past, present and future of digital scholarship on Dante’s work, organized by "Dipartimento di Studi Umanistici", VEDPH, Ca' Foscari University.
Rachele Sprugnoli
added a research item
In our talk, we present the structure and the linguistic resources currently included in the LiLa Knowledge Base, i.e. a collection of multifarious textual and lexical resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the Linked Data paradigm. We also present a set of lemma embeddings for Latin and a couple of experiments using such embeddings for inducing sentiment lexicons and for analyzing diachronical language change in two Latin corpora.
Eleonora Litta Modignani Picozzi
added a research item
This paper describes the steps taken to model a valency lexicon for Latin (Latin Vallex) according to the principles of the Linked Data paradigm, and to interlink its valency frames with the lexical senses recorded in a manually checked subset of the Latin WordNet. The valency lexicon and the WordNet share lexical entries and are part of the LiLa Knowledge Base, which interlinks multiple linguistic resources for Latin. After describing the overall architecture of LiLa, as well as the structure of the lexical entries of Latin Vallex and Latin WordNet, the paper focuses on how valency frames have been modeled in LiLa, in line with a submodule of the Predicate Model for Ontologies (PreMOn) specifically created for the representation of grammatical valency. A mapping of the valency frames and the WordNet synsets assigned to the lexical entries shared by the two resources is detailed, as well as a number of queries that can be run across the interoperable resources for Latin currently included in LiLa.
Rachele Sprugnoli
added a research item
La presentazione descrive il corpus EvaLatin 1.0 che contiene i dati annotati di addestramento e valutazione rilasciati per la campagna di valutazione EvaLatin 2020.
Rachele Sprugnoli
added a research item
In this paper we present the methodology followed to extend a Latin sentiment lexicon (called LatinAffectus), the process of inclusion of the lexicon in a knowledge base of interoperable linguistic resources for Latin and one use case performed on the treebank of Dante Alighieri’s Latin works annotated following the Universal Dependencies guidelines. In addition, we report on our first attempt at linking the polarity scores of SentiWordNet 3.0 to a manually revised version of Latin WordNet.
Rachele Sprugnoli
added a research item
Presentation given for the PhD course "Emotion-oriented systems" at the University of Turin. Content updated with respect to the previous presentation entitled "Sentiment Analysis for Latin: a Journey from Seneca to Thomas Aquinas".
Rachele Sprugnoli
added a research item
This paper presents the early stages of the development of a new treebank containing all of Dante Alighieri’s Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-annotator agreement and LA, UAS and LAS. The aim is to release a new resource, in view of the celebrations for the 700th anniversary of Dante’s death, which can support the development of the Vocabolario Dantesco.
Rachele Sprugnoli
added a research item
While the main applications of resources and tools for sentiment analysis typically fall within the scope of fields like customer experience and social media monitoring, there is an increasing interest in extending their range to texts written in ancient and historical languages. Such interest mirrors the substantial growth of the area dedicated to building and using linguistic resources for these languages, which are essential for accessing and understanding the Classical tradition. In this talk, we will present the methodology we followed to create and evaluate a new set of Latin sentiment lexicons, and the process of inclusion of a prior polarity lexicon of Latin lemmas in a knowledge base of interoperable linguistic resources developed within the ERC project “LiLa: Linking Latin”. We will discuss the main challenges we face when working with ancient languages (e.g., lack of native speakers, limited amount of data, unusual textual genres for the sentiment analysis task, such as philosophical or documentary texts) and we will describe two use cases underscoring the importance of an interdisciplinary approach combining computational linguistics, semantic web and humanities practices.
Greta Franzini
added a research item
This paper describes the addition of an index of 1, 763 Ancient Greek loan-words to the collection of Latin lemmas of the LiLa: Linking Latin Knowledge Base of interoperable linguistic resources. This lexical resource increases LiLa's lemma count and tunes its underlying data model to etymological borrowing.
Rachele Sprugnoli
added a research item
This paper presents the early stages of the development of a new tree-bank containing all of Dante Alighieri's Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-annotator agreement and LA, UAS and LAS. The aim is to release a new resource, in view of the celebrations for the 700th anniversary of Dante's death, which can support the development of the Vocabolario Dantesco.
Greta Franzini
added a research item
This paper presents the structure of the LiLa Knowledge Base, i.e. a collection of multifarious linguistic resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the so-called Linked Data paradigm. Following its highly lexically based nature, the core of the LiLa Knowledge Base consists of a large collection of Latin lemmas, serving as the backbone to achieve interoperability between the resources, by linking all those entries in lexical resources and tokens in corpora that point to the same lemma. After detailing the architecture supporting LiLa, the paper particularly focusses on how we approach the challenges raised by harmonizing different strategies of lemmatization that can be found in linguistic resources for Latin. As an example of the process to connect a linguistic resource to LiLa, the inclusion in the Knowledge Base of a dependency treebank is described and evaluated.
Rachele Sprugnoli
added 2 research items
In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called LatinAffectus, in a knowledge base of interoperable linguistic resources developed within the "LiLa: Linking Latin" project. More specifically, a manually-curated list of lemma-sentiment pairs is linked to a comprehensive collection of Latin lemmas by using Semantic Web and Linked Data standards and practices. LatinAffectus is modeled relying on three formal representation frameworks: Lemon and Ontolex to describe the lexicon, and the Marl ontology to describe the sentiment properties of each of its lexical entries. We present the lexicon, the methodology and the results of the linking process, as well as a use case and the planned future work.
Presentation for the 3rd Workshop on Humanities in the Semantic Web (WHiSe), co-located with the 15th Extended Semantic Web Conference (ESWC 2020) and held online on June 2, 2020. The paper has won the Best Paper Award of the workshop. ABSTRACT: In this paper, we describe the process of inclusion of a prior polarity lexicon of Latin lemmas, called LatinAffectus, in a knowledge base of interoperable linguistic resources developed within the "LiLa: Linking Latin" project. More specifically, a manually-curated list of lemma-sentiment pairs is linked to a comprehensive collection of Latin lemmas by using Semantic Web and Linked Data standards and practices. LatinAffectus is modeled relying on three formal representation frameworks: Lemon and Ontolex to describe the lexicon, and the Marl ontology to describe the sentiment properties of each of its lexical entries. We present the lexicon, the methodology and the results of the linking process, as well as a use case and the planned future work.
Rachele Sprugnoli
added 2 research items
This paper describes the first edition of EvaLatin, a campaign totally devoted to the evaluation of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, are aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period. This also allows us to propose the Cross-genre and Cross-time subtasks for each task, in order to evaluate the portability of NLP tools for Latin across different genres and time periods. The results obtained by the participants for each task and subtask are presented and discussed.
Sentiment lexicons are essential for developing automatic sentiment analysis systems, but the resources currently available mostly cover modern languages. Lexicons for ancient languages are few and not evaluated with high-quality gold standards. However, the study of attitudes and emotions in ancient texts is a growing field of research which poses specific issues (e.g., lack of native speakers, limited amount of data, unusual textual genres for the sentiment analysis task, such as philosophical or documentary texts) and can have an impact on the work of scholars coming from several disciplines besides computational linguistics, e.g. historians and philologists. The work presented in this paper aims at providing the research community with a set of sentiment lexicons built by taking advantage of manually-curated resources belonging to the long tradition of Latin corpora and lexicons creation. Our interdisciplinary approach led us to release: i) two automatically generated sentiment lexicons; ii) a Gold Standard developed by two Latin language and culture experts; iii) a Silver Standard in which semantic and derivational relations are exploited so to extend the list of lexical items of the Gold Standard. In addition, the evaluation procedure is described together with a first application of the lexicons to a Latin tragedy.
Greta Franzini
added an update
We are delighted to announce that our LEMMA BANK QUERY INTERFACE is now online at:
The Lemma Bank comprises 134,228 Latin lemma objects, 58,278 hypolemma objects, as well as 4,224 lexical bases, 109 suffixes and 41 prefixes.
A short video tour of the interface is available here: https://twitter.com/ERC_LiLa/status/1234429008663777280
Feedback always welcome!
 
Greta Franzini
added a research item
This paper describes a preliminary expansion and assessment of the Latin WordNet for the purposes of the LiLa: Linking Latin project. The objective of this study is to better understand the implications of expanding and evaluating the sense coverage of the Latin WordNet, with a view to identifying the most effective method for its refinement and inclusion in the LiLa Knowledge Base of Latin resources. Our test empirically demonstrates the inadequacy for Latin of a common semi-automated approach of expansion and informs potential lines of improvement for the resource.
Rachele Sprugnoli
added a research item
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the embeddings of rare lemmas. In addition, we release vectors pre-trained on the “Opera Maiora” by Thomas Aquinas, thus providing a resource to analyze Latin in a diachronic perspective.
Greta Franzini
added 2 research items
The LiLa: Linking Latin project was recently awarded funding from the European Research Council to build a Knowledge Base of linguistic resources for Latin. LiLa responds to the growing need in the fields of Computational Linguistics, Humanities Computing and Classics to create an interoperable ecosystem of resources and Natural Language Processing tools for Latin. To this end, LiLa makes use of Linked Open Data practices and standards to connect words to distributed textual and lexical resources via unique identifiers. In so doing, it builds rich knowledge graphs, which can be used for research and teaching purposes alike. This paper details the architecture of the LiLa Knowledge Base and presents the solutions found to address the challenges raised by populating it with a first set of linguistic resources.
This paper describes a preliminary expansion and assessment of the Latin WordNet for the purposes of the LiLa: Linking Latin project. The objective of this study is to better understand the implications of expanding and evaluating the sense coverage of the Latin WordNet, with a view to identifying the most effective method for its refinement and inclusion in the LiLa Knowledge Base of Latin resources. Our test empirically demonstrates the inadequacy for Latin of a common semi-automated approach of expansion and informs potential lines of improvement for the resource.
Rachele Sprugnoli
added a research item
This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. A qualitative evaluation is also performed on the embeddings of rare lemmas. In addition, we release vectors pre-trained on the “Opera Maiora” by Thomas Aquinas, thus providing a resource to analyze Latin in a diachronic perspective.
Greta Franzini
added an update
Passarotti Marco, Mambrini Francesco. 2019. Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin. In Friedrich Annemarie, Zeyrek Deniz (eds.), Proceedings of the 13th Linguistic Annotation Workshop (LAW XIII). August 1, 2019. Florence, Italy, Association for Computational Linguistics, Florence, Italy, 2019, pp. 71-80. ISBN: 978-1-950737-38-3. Available at: https://sigann.github.io/LAW-XIII-2019/pdf/W19-4009.pdf
 
Greta Franzini
added an update
Passarotti, Marco; Cecchini, Flavio Massimiliano; Litta, Eleonora; Franzini, Greta; Mambrini, Francesco; Ruffolo, Paolo. 2019. LiLa: Linking Latin – A Knowledge Base of Linguistic Resources and NLP Tools, in Proceeding of the 2nd Conference on Language, Data and Knowledge (LDK 2019). Available at: http://ceur-ws.org/Vol-2402/paper2.pdf
 
Greta Franzini
added an update
Our prototype RDF triplestore database is now available online for you to try! The purpose of the database is to allow you to perform queries across different linguistic resources for Latin. The resources currently connected within this LiLa triplestore are LEMLAT, Word Formation Latin and the PROIEL Latin Treebank (Universal Dependencies distribution).
 
Greta Franzini
added an update
On 15th November Universal Dependencies (UD) released version 2.3 of its set of annotated treebanks. Thanks to the work of our very own Flavio M. Cecchini and Marco C. Passarotti, and of our Czech colleague Dan Zeman (Prague), UD now includes a new and improved conversion of the Index ThomisticusTreebank (IT-TB). For more information, visit: https://lila-erc.eu/index-thomisticus-treebank-to-universal-dependencies/
 
Flavio Massimiliano Cecchini
added a research item
This paper describes the changes applied to the original process used to convert the Index Thomisticus Treebank, a corpus including texts in Medieval Latin by Thomas Aquinas, into the annotation style of Universal Dependencies. The changes are made both to harmonise the Universal Dependencies version of the Index Thomisticus Treebank with the two other available Latin treebanks and to fix errors and inconsistencies resulting from the original process. The paper details the treatment of different issues in PoS tagging, lemmatisation and assignment of dependency relations. Finally, it assesses the quality of the new conversion process by providing an evaluation against a gold standard.
Greta Franzini
added a project goal
The LiLa: Linking Latin (2018-2023) project was awarded funding from the European Research Council (ERC) to build a knowledge base of linguistic resources for Latin. LiLa responds to the growing need in the fields of Computational Linguistics and Humanities Computing to create an interoperable ecosystem of NLP tools and resources for the automatic processing of Latin. To this end, LiLa makes use of Linked Open Data (LOD) practices and standards to connect words to distributed textual and lexical resources alike.
Project website: https://lila-erc.eu