
Monica Monachini- Research Director at Italian National Research Council
Monica Monachini
- Research Director at Italian National Research Council
About
112
Publications
15,111
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,551
Citations
Introduction
Current institution
Publications
Publications (112)
Questo poster descrive gli obiettivi del progetto H2IOSC, Humanities and Heritage Open Science Cloud, che mira a costruire un cluster federato e inclusivo di IR nel dominio ESFRI dell'innovazione sociale e culturale volto a supportare ai ricercatori nelle varie discipline nei settori delle scienze umane, delle tecnologie linguistiche e dei beni cul...
Oral archives and digital technologies have gone hand-in-hand for a very long time. Both sides benefit from this interdisciplinary junction: technology enhances the preservation and diffusion of oral materials, while exploiting them to develop cutting-edge tools for their treatment. This chapter deals with an Italian instantiation of this mutual re...
This paper presents Cretan Institutional Inscriptions, a resource in the domain of Digital Epigraphy developed at the Ca’ Foscari University of Venice and supported by CLARIN-IT as part of its actions addressed to initiatives, projects and events in the field of Social Sciences and Humanities. The paper begins with a brief outline of the project wi...
Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sens...
Archives often include documents that can hardly be considered publications or grey literature as such, yet they maintain their documentary value and play a role of primary sources for the specialists. These documents, indeed, can help archivists to reveal the sedimentation process of the archive itself and to preserve the authentic context of the...
Audio and audiovisual archives are at the crossroads of different fields of knowledge, yet they require common solutions for both their long-term preservation and their description, availability, use and reuse. Archivio Vi.Vo. is an Italian project financed by the Tuscany Region, aiming to (i) explore methods for long-term preservation and secure a...
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover,...
Ancient Greek poetry is an essential part of the western cultural heritage; thus, it is important that people have access to its texts and whatever relates to their understanding in a reliable and easy way. Whenever user evaluation is concerned, mock-ups are used by designers to acquire feedback from users. A mock-up is defined as a model of the fi...
Ancient Greek studies, and Classics in general, is a perfect field of investigation in Digital Humanities. Indeed, DH approaches could become a means of building models for complex realities, analyzing them with computational methods and sharing the results with a broader public. Ancient texts have a complex tradition, which includes many witnesses...
This paper gives an overview on the Italian national CLARIN consortium as it currently stands two years after its creation at the end of 2015. It thus discusses the current state of affairs of the consortium on several aspects, especially with regards to members. It also discusses the events and initiatives that have been undertaken, as well as the...
https://www.clarin.eu/sites/default/files/Monachini-Nicolosi-Stefanini-CLARIN2017_paper_3.pdf
In this article we describe our ongoing attempts to use the Semantic Web Rule Language (SWRL) to model the morphological layer of a wide-coverage Italian lexical resource, Parole-Simple-Clips (PSC); in this case that subset of PSC dealing with Italian noun morphology. After giving a brief introduction to SWRL and to Italian noun morphology we go on...
Il 1o ottobre 2015 il MIUR firma l’adesione dell’Italia a CLARIN-ERIC, l’infrastruttura di ricerca che offre risorse e tecnologie linguistiche dedicate al settore delle scienze del linguaggio e delle scienze umane e sociali. Questo articolo intende fornire alla comunità italiana una ampia panoramica di CLARIN, la sua missione, i suoi pilastri, i se...
This chapter presents a computer platform supporting a Marine Information and Knowledge System based on a repository that gathers, classify and structures marine scientific literature and data, guaranteeing their accessibility by means of standard protocols. This requires the access to quality controlled data and to information that is provided in...
This paper illustrates the transformation of GeoNames’ ontology concepts, with their English labels and glosses, into a GeoDomain WordNet-like resource in English, its translation into Italian, and its linking to the existing generic WordNets of both languages. The paper describes the criteria used for the linking of domain synsets to each other an...
In this paper we present ongoing research carried out at the Institute for Computational Linguistics “A. Zampolli” (ILC) in Pisa. The institute has been active since many years in the field of Digital Humanities providing resources, tools and solutions to address issues of the to digital humanists. Starting from those previous initiatives, we show...
EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for the Italian language: since 2007 shared tasks have been proposed covering the analysis of both written and spoken language with the aim of enhancing the development and dissemination of resources and technologies for Italian. EVALITA is an initiative of the Itali...
This paper describes the publication and linking of (parts of) PAROLE SIMPLE CLIPS (PSC), a large scale Italian lexicon, to the Semantic Web and the Linked Data cloud using the lemon model. The main challenge of the conversion is discussed, namely the reconciliation between the PSC semantic structure which contains richly encoded semantic informati...
This work presents a proposal for the development of a natural language processing module for event and temporal analysis of biographies as available in Wikipedia. At the current level of development, we restricted the extraction to temporally anchored events as they represent salient information which can be further used to extract additional even...
The MAPS (Marine Planning and Service Platform) project is a development of the Marine project (Ricerca Industriale e Sviluppo Sperimentale Regione Liguria 2007-2013) aiming at building a computer platform for supporting a Marine Information and Knowledge System, as part of the data management activities. One of the main objective of the project is...
CLiC-it 2015 is held in Trento on December 3-4 2015, hosted and locally organized by Fondazione Bruno Kessler (FBK), one the most important Italian research centers for what concerns CL. The organization of the conference is the result of a fruitful conjoint effort of different research groups (Università di Torino, Università di Roma Tor Vergata a...
The main purpose of this paper is to serve as a landmark for future research and in particular for future strategic, infrastructural and coordination initiatives. It presents a preliminary plan for actions and infrastructures that could become the basis for future initiatives in the sector of Language Resources and Technologies (LRTs). The FLaReNet...
This report describes the EVENTI (EValuation of Events aNd Temporal Information) task organized within the EVALITA 2014 evaluation campaign. The EVENTI task aims at evaluating the performance of Temporal Information Processing systems on a corpus of Italian news articles. Motivations for the task, datasets, evaluation metrics, and results obtained...
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative's work throughout Europe in order to boost progress a...
Action verbs have many meanings, covering actions in different ontological types. Moreover, each language categorizes action in its own way. One verb can refer to many different actions and one action can be identified by more than one verb. The range of variations within and across languages is largely unknown, causing trouble for natural language...
This contribution, aims at highlighting the strong interconnection between lexicons, terminologies and ontologies and especially the fundamental role that ontologies and lexica mutually play. Our view is that lexical resources are evolving in nature, from ontologically based lexicons we are going towards lexically based ontologies. We explore diffe...
Action verbs express important information in a sentence and they are the most frequent elements in speech, but they are also one of the most difficult part of the lexicon to learn for L2 language learners, because languages segment these concepts in very different ways. The two sentences "Mary folds her shirt" and "Mary folds her arms" refer to tw...
http://www.aclweb.org/anthology/W/W13/#5400
The papers in this volume represent some of the most recent and exciting work being carried out both within the framework of Generative Lexicon and related approaches to the lexicon and lexical resources. With the recent emphasis in natural language processing on the development of machine learning algori...
Wordnet-lexical markup framework (LMF) is an instantiation of LMF for representing Wordnet-like semantic dictionaries. Wordnet is a widely accepted resource and thus provides a good case for testing the viability of a representation in LMF and the acceptance by a wide range of users. Wordnet-LMF was developed in the framework of the EU project KYOT...
The importance of designing standards for language resources (LR) is firmly established, starting with the Expert Advisory Group for Language Engineering (EAGLES) and International Standards for Language Engineering (ISLE) initiatives. Both EAGLES and ISLE stress the importance of reaching a consensus on (linguistic and nonlinguistic) ?content?, in...
This chapter argues that the lexical markup framework (LMF) can play a significant role in realizing servicized lexical resources on the Web. To accomplish this goal, it begins with a brief introduction of the notion of servicized resources, and then presents a technical architecture of, what is called, LMF-aware lexicon access services. It present...
To make the vision of a European Information Infrastructure and of the Semantic Web a reality, two key issues are tackled: (i) content, which must be dealt with in a multilingual environment; (ii) standards, which are critical to achieve interoperability and integration. In the Semantic Web scenario, ontologies are the key components to manage know...
In the last 20 years dictionaries and lexicographic resources such as WordNet have started to be enriched with multimodal content. Short videos depicting basic actions support the user’s need (especially in second language acquisition) to fully understand the range of applicability of verbs. The IMAGACT project has among its results a repository of...
This paper demonstrates that Wordnet-LMF, a version of ISO LMF, allows us to effectively design and implement Web services for accessing WordNet-type semantic lexicons that conform to the REST Web service architecture. The implemented prototype service currently provides access to native wordnets as well as to a bilingual concept dictionary. This p...
Action verbs, which are highly frequent in speech, cause disambiguation problems that are relevant to Language Technologies. This is a consequence of the peculiar way each natural language categorizes Action i.e. it is a consequence of semantic factors. Action verbs are frequently “general”, since they extend productively to actions belonging to di...
The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project. The FLaReNet recommendations are organised around nine dimensions: a) docu...
Action verbs are the less predictable linguistic type for bilingual dictionaries and they cause major problems for NLP technologies. This is not only because of language specific phraseology, but it is rather a consequence of the peculiar way each language categorizes events.
In ordinary languages the most frequent action verbs are “general”, sinc...
This paper presents the IMAGACT annotation infrastructure which uses both corpus-based and competence-based methods for the simultaneous extraction of a language independent Action ontology from English and Italian spontaneous speech corpora. The infrastructure relies on an innovative methodology based on images of prototypical scenes and will iden...
This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborat...
Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events...
The Language Grid is a distinctive language service infrastructure in the sense that it accommodates a wide variety of user
needs, ranging from technical novices to experts; language resource consumers to language resource providers. As these language
services are various in type and each of them can be idiosyncratic in many aspects, the service in...
This document proposes an overview of the current (at the time of writing) scene towards an Interoperability Framework and acts as a reference point for the standards that our community supports. This initiative is in close synchronization with other
relevant initiatives such as CLARIN, ELRA, ISO and TEI and META-
Share.
The document builds on th...
This paper presents the metadata schema for describing language resources (LRs) cur-rently under development for the needs of META-SHARE, an open distributed facility for the exchange and sharing of LRs. An es-sential ingredient in its setup is the existence of formal and standardized LR descriptions, cornerstone of the interoperability layer of an...
This paper reports on recent activities carried out within the KYOTO project aimed at enhancing the Italian WordNet Language Resource. On the one hand we study the formalisation of this lexicon ac-cording to the LMF ISO standard and explore its ap-plication into a real-world scenario by means of rep-resenting it in the WN-LMF dialect. On the other...
This paper studies the importance of qualia relations for Word Sense Disam-biguation (WSD). We use a graph-based WSD algorithm over the Italian Word-Net and evaluate it when adding differ-ent kinds of qualia relations (agentive, constitutive, formal and telic) taken from PAROLE-SIMPLE-CLIPS (PSC), a Lan-guage Resource based on the Generative Lexico...
We have successfully adapted and extended the automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK). First, we extract AWN’s instantiable nouns and identify the corresponding categories and hyponym subcategories in AWK. Then, we exploit Wikipedia inter-lingual links to l...
This document describes the prelimi-nary release of the integrated Kyoto sys-tem for specific domain WSD. The sys-tem uses concept miners (Tybots) to ex-tract domain-related terms and produces a domain-related thesaurus, followed by knowledge-based WSD based on word-net graphs (UKB). The resulting system can be applied to any language with a lexica...
This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea behind the service design is: given a query, the primary functionality of lexicon access is to present a partial lexicon by extracting the relevant part of the target lexicon. Based on this idea, we implemented the system as a RESTful Web servicewhose...
In this paper we present a Web Service Ar-chitecture for managing high level interop-erability of Language Resources (LRs) by means of a Service Oriented Architecture (SOA) and the use of ISO standards, such as ISO LMF. We propose a layered architecture which separates the management of legacy re-sources (data collection) from data aggre-gation (wo...
This paper reports prototype multilingual query expansion system relying on LMF compliant lexical resources. The system is one of the deliverables of a three-year project aiming at establishing an international standard for language resources which is applicable to Asian languages. Our important contributions to ISO 24613, standard Lexical Markup F...
In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing
on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons.
This is intended as a case-study investigating the needs and requirements of semi-automatic integra...
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applica- tions. With this respect, we believe that the production of a consensual specifica- tion on multilingual lexicons c...
In this paper we present Wordnet-LMF, a dialect of ISO Lexical Markup Framework that instantiates LMF for representing wordnets. Wordnet-LMF was developed in the framework of the EU KYOTO project for the specific purpose of endowing a set of wordnets with a standardized interoperability format allowing the interchange of lexico-semantic information...
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into
consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from
the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve i...
We present KAF, the KYOTO Annotation Format. KAF is a layered and extendible linguistic annotation format that is specif-ically developed to arrive at semantic in-teroperability. KAF is used in seven lan-guages in several applications throughout the KYOTO (Knowledge Yielding Ontolo-gies for Transition-based Organization) project. The goal of these...
This research deals with the modelling of a Generative Lex-icon based ontology to be used in the Semantic Web and Natural Lan-guage Processing semantic tasks. This ontology is imported from a exist-ing computational Lexical Resource and is converted to the W3C stan-dard Web Ontology Language. This presents some challenges, as for ex-ample the multi...
We outline work to be carried out within the framework of an impending EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology) anchored in a language-independent ontology that is linked to wordnets in several languages. For each language, information extraction and identification of...
This paper presents the automatic extension of Princeton WordNet with Named Entities (NEs). This new resource is called Named Entity WordNet. Our method maps the noun is-a hierarchy of WordNet to Wikipedia categories, identifies the NEs present in the latter and extracts different information from them such as written variants, definitions, etc. Th...
In this paper we address the issue of developing an interoperable infrastructure for language resources and technologies. In our approach, called UFRA, we extend the Federate Database Architecture System adding typical functionalities coming from UIMA. In this way, we capitalize the advantages of a federated architecture, suc h as autonomy, heterog...
This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale...
After a brief presentation of the data model, we describe a work in progress to define an initial set of morpho-syntactic and syntactic data categories dedicated to NLP applications. The aim is to improve interoperability among language resources and to optimize the process leading to their integration in applications. The main point is to be sure...
This paper describes the design, implementation and population of the BioLexicon in the framework of BootStrep, an FP6 project. The BioLexicon (BL) is a lexical resource designed for text mining in the bio-domain. It has been conceived to meet both domain requirements and upcoming ISO standards for lexical representation. The data model and data ca...
Abstract Given a situation where human ,language technologies have been maturing ,consid- erably and a rapidly growing range of lan- guage data resources being now available, together with natural language processing (NLP) tools/systems, a strong need for a global language infrastructure(GLI) is be- coming more and more evident, if one wants,to ens...
This paper deals with the relations between ontologies and lexicons. We study the role of these two components and their evolution during the last years in the fleld of Computational Linguistics. Subse- quently, we survey the current lines of research at ILC-CNR which tackle this topic. They involve (I) the reuse of already existing Lexical Resourc...
The present paper describes a large-scale lexical resource for the biology domain designed both for human and for machine
use. This lexicon aims at semantic interoperability and extendability, through the adoption of ISO-LMF standard for lexical
representation and through a granular and distributed encoding of relevant information. The first part o...
The present paper describes the model and database structure of a large-scale lexical resource for the biology domain designed both for human and for machine use. Our lexicon aims at semantic interoperability and extendability. This is achieved through the adoption of the up-coming ISO standard for lexical representation and through a granular and...
The present work falls in the line of activities promoted by the European Languguage Resource Association (ELRA) Production Committee (PCom) and raises issues in methods, procedures and tools for the reusability, creation, and management of Language Resources. A two-fold purpose lies behind this experiment. The first aim is to investigate the feasi...
Enhancing the development of multilingual lexicons is of foremost importance for intercultural collaboration to take place,
as multilingual lexicons are the cornerstone of several multilingual applications. However, the development and maintenance
of large-scale, robust multilingual dictionaries is a tantalizing task. In this paper we present a too...
This paper assumes that the linguistic side of terminologies is necessarily partially informed by the knowledge of the specific domain and claims that semantic relations, especially those accounting for the syntagmatic relations of words in context, are crucial for the representation of this kind of information. This paper also argues that the priv...
This paper presents on-going research to for- malise the ontology of a computational lexicon in OWL (W3C standard) as well as to enrich it by applying a bottom-up approach that ex- tracts semantic information from the lexicon. The resource used follows the Generative Lexi- con (GL) theory and therefore (1) puts a chal- lenge to ontology design as i...
This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of "new paradigm" is required, we think that the emerging and still evolving technology connected to Grid computing is a very...
In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integra...
This demo presents LeXFlow, a work- flow management system for cross- fertilization of computational lexicons. Borrowing from techniques used in the domain of document workflows, we model the activity of lexicon manage- ment as a set of workflow types, where lexical entries move across agents in the process of being dynamically updated. A prototype...
As an area of great linguistic and cul- tural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language re- sources that is compatible with an interna- tional standard has at least three strong ad- vantages: to increase the competitive edge of Asian countries...
In this paper, we show the importance of standards as an essential aspect for any research infrastructure in the humanities. In the context of the current activities within ISO committee TC 37/SC 4 (Language Resource Management), we show in particular how important it is to provide means to compare linguistic representations through the use of a sh...
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on lexicons can be a useful ai...
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on lexicons can be a useful ai...
Availability of linguistic resources for the development of human language technology applications is nowadays recognized as a critical issue with both political and economic impact and implications on the sphere of cultural identity. This paper reports about the experience gained during the INTERA European project for the production of multilingua...
This paper describes the results of work made for ELRA during 2003-2004. It describes the methodology for validation of written language resources (WLRs), specifically lexica, which has been developed for ELRA and tested on a few resources in the ELRA catalogue. It discusses the importance of key issues in lexicon creation and validation such as th...
The paper tackles the issue of content interoperability among lexical resources, by presenting an experiment of mapping differently conceived lexicons, FrameNet and NOMLEX, onto MILE (Multilingual ISLE Lexical Entry), a meta-entry for the encoding of multilingual lexical information, acting as a general schema of shared and common lexical objects....
This volume is a collection of original contributions from outstanding scholars in linguistics, philosophy and computational linguistics exploring the relation between word meaning and human linguistic creativity. The papers present different aspects surrounding the question of what is word meaning, a problem that has been the centre of heated deba...
The project LE-SIMPLE is an innovative attempt at building harmonized syntacticsemantic lexicons for twelve European languages,
intended for use in different Human Language Technology applications. SIMPLE provides a general design model for the encoding
of a large amount of semantic information, ranging from ontological typing, to argument structur...
Lexicons, as described in the previous chapter, are a valuable resource, not only for wordclass tagging but also for many other applications in the broad area of language engineering (LE), which encompasses fields such as computational linguistics and Natural Language Processing (NLP). Furthermore, the last decade in particular has seen an increasi...
An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by th...
Lexical Markup Framework (LMF) is a model that provides a common standardized framework for Natural Language Processing (NLP) lexicons. The goals of LMF are to provide a common model for the creation and use of such lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of a large number of in...
This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development...