ChapterPDF Available

Developing a large scale FrameNet for Italian: the IFrameNet experience

Authors:

Abstract

La collana pubblica gli atti del convegno annuale di Linguistica Computazionale (CLiC-it), che ha lo scopo di costituire un luogo di discussione di riferimento nel campo delle ricerce sulla linguistica computazionale. Gli atti includono interventi sul trattamento automatico della lingua, comprendenti le riflessioni teoriche e metodologiche sul tema, e forniscono un contributo importante per questo campo di ricerca. Le altre tematiche principali sono: la linguistica computazionale, la linguistica, le scienze cognitive, l'apprendimento automatico, l'informatica, la rappresentazione della conoscenza, l'information retrieval e l'umanistica digitale. L'organizzazione del convegno è il risultato dello sforzo dell'Associazione Italiana di Linguistica Computazionale (AILC http://www.ai-lc.it/), rappresentata ogni anno da alcuni dei membri organizzatori, che sono affiliati anche ad altre organizzazioni che operano nell'ambito della linguistica computazionale.
... In this paper we thus investigate the applicability of the method proposed in Pennacchiotti et al. (2008) to boost the coverage of a novel and still limited lexical resource based on Frame Semantics in Italian. This resource has been developed within the IFrameNet (IFN) project (Basili et al., 2017), which aims at creating a large coverage FrameNet-like resource for Italian and to come up with a complete dictionary in which every lexical entry 1 is linked to all the frames it can evoke (i.e., the frames for which it is a LU). At this moment, while the resource counts more than 7,700 lexical items associated to more than 1,048 frames, each lexical item is connected, on average, to only 1.3 frames, and it is problematic if considering the high polysemy of Italian words (Casadei, 2014). ...
... The IFrameNet project (Basili et al., 2017), relied, as a starting point, on the achievements of previous researches on the development of Italian resources annotated according to Frame Semantics DeCao et al., 2010), i.e., a set of automatically induced LUs that were covering 554 frames of the 1, 224 frames in FrameNet. ...
... Despite the significant efforts made on creating FrameNets for other languages, 3 the results are promising but far from definitive. The Italian language-which was our first target case in the development of data extraction tools-still currently lacks stable or sufficiently semantically annotated resources [2,11,16,21]. Moreover, the very same maintainers of the original FrameNet are asking themselves not only to what extent are the semantic frames developed for English appropriate for other languages, but also under what circumstances may frames be considered as cross-language or, in other words, universal. ...
Chapter
The definition of alternative processing techniques as applied to business documents is inevitably at odds with long-standing issues derived by the unstructured nature of most business-related information. In particular, more and more refined methods for automated data extraction have been investigated over the years. The last frontier in this sense is Semantic Role Labeling (SRL), which extracts relevant information purely based on the overall meaning of sentences. This is carried out by mapping specific situations described in the text into more general scenarios (semantic frames). FrameNet originated as a semantic frame repository by applying SRL techniques to large textual corpora, but its adaptation to languages other than English has been proven a difficult task. In this paper, we introduce a new implementation of SRL called Verb-Based SRL (VBSRL) for information extraction. VBSRL relies on a different conceptual theory used in the context of natural language understanding, which is language-independent and dramatically elevates the importance of verbs to abstract from real-life situations.
... Partendo da FrameNet per l'inglese, sviluppato alla fine degli anni '90 a Berkeley sulla base della "frame semantics" proposta dal linguista Charles Fillmore, Emanuele ha proposto di crearne la versione italiana, riutilizzando dove possibile tecniche di proiezione dell'annotazione già sperimentate in MultiWordNet. La risorsa annotata, da lui coordinata, è stata rilasciata alla comunità scientifica e rappresenta tutt'ora uno dei nuclei centrali di FrameNet per l'italiano (Basili et al. 2017), un progetto ancora in corso a cui collaborano diverse università. ...
Article
Full-text available
Almost eight years after his untimely death, the scientific contribution of Emanuele Pianta still appears significant to us, in particular for the variety of the topics he dealt with and for his capacity to move cross-disciplinarily between different areas of computational linguistics. Today, retracing the steps of Emanuele’s scientific carrier has the meaning of rediscovering an important part of the scientific challenges that the Italian research community has faced over a period of more than twenty years. In recognition of the role he played, the Italian Association of Computational Linguistics entitled to Emanuele Pianta the annual award assigned to the best master’s degree thesis in the context of Computational Linguistics, discussed in an Italian University.
Article
Full-text available
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art.
Chapter
This paper locates the linguistic-semantic use of frames within a merging of the traditions of using the word 'frames' in general cognitive science, where its connection to language is missing or incidental, and in linguistics and artificial intelligence where it was explicitly used for identifying the combinatory possibilities of lexical units, both syntactically (as, e.g., subcategorization frames) or semantically (as case frames). Examples are offered of the possible kinds of empirical and cognitive studies of English words and texts, and a sketch is provided of a large research effort devoted to the creation of a corpus-supported frame-based lexicon of English.
Article
Vector-based models of word meaning have become increasingly popular in cognitive science. The appeal of these models lies in their ability to represent meaning simply by using distributional information under the assumption that words occurring within similar contexts are semantically similar. Despite their widespread use, vector-based models are typically directed at representing words in isolation, and methods for constructing representations for phrases or sentences have received little attention in the literature. This is in marked contrast to experimental evidence (e.g., in sentential priming) suggesting that semantic similarity is more complex than simply a relation between isolated words. This article proposes a framework for representing the meaning of word combinations in vector space. Central to our approach is vector composition, which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models that we evaluate empirically on a phrase similarity task.