Federico Sangati

Federico Sangati
University of Naples "L'Orientale" | IUO · Department of Linguistics

Doctor of Philosophy

About

49
Publications
4,462
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
283
Citations
Citations since 2016
30 Research Items
222 Citations
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040
2016201720182019202020212022010203040
Introduction
I am a freelancer and an NLP specialist working at the UNIOR NLP Research Group at University of Naples. My research mainly focuses on corpus analysis, grammars (parsing), multi-word expressions, machine translation, and natural language understanding.

Publications

Publications (49)
Conference Paper
Full-text available
We evolve artificial agents to perform a simple tracking task in three conditions: one individual (Isolated Condition) and two joint action conditions with division of labor. The joint conditions differ by whether two agents switch complementary roles during the task (Generalist Condition) or always play the same role (Specialist Condition). At the...
Chapter
Inspired by the historical models of artificial and auxiliary languages, Emojitaliano is the result of a social and crowdsourcing experiment which was conducted by a group of seventeen translators, followers of the “Scritture brevi” blog, and led to the creation of an international language based on emojis. The experiment was carried out during 201...
Article
Full-text available
The social brain hypothesis proposes that enlarged brains have evolved in response to the increasing cognitive demands that complex social life in larger groups places on primates and other mammals. However, this reasoning can be challenged by evidence that brain size has decreased in the evolutionary transitions from solitary to social larger grou...
Conference Paper
Full-text available
In this paper, we present an experiment performed with the aim of evaluating if linguistic knowledge of expert quality about Romanian synonyms could be crowdsourced from L1 language learners , learning Romanian as their mother tongue, by collecting and aggregating their answers to two types of questions that are automatically generated from a datas...
Conference Paper
Full-text available
This paper investigates a general framework for synchronous educational language games that simultaneously allows researchers to crowdsource learner answers in a controlled environment. Our prototype Substituto allows teachers and students to interact in real-time while undergoing language learning exercises; ensuring that the learner's progress is...
Conference Paper
In this paper, we describe a Telegram bot, Mago della Ghigliottina (Ghigliottina Wizard), able to solve La Ghigliottina game (The Guillotine), the final game of the Italian TV quiz show L'Eredità. Our system relies on linguistic resources and artificial intelligence and achieves better results than human players (and competitors of L'Eredità too)....
Conference Paper
Full-text available
In this work, we report on a crowdsourcing experiment conducted using the V-TREL vocabulary trainer which is accessed via a Telegram chatbot interface to gather knowledge on word relations suitable for expanding ConceptNet. V-TREL is built on top of a generic architecture implementing the implicit crowdsourding paradigm in order to offer vocabulary...
Conference Paper
Full-text available
We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercise...
Preprint
Full-text available
In this paper, we describe a Telegram bot, Mago della Ghigliottina (Ghigliottina Wizard), able to solve La Ghigliottina game (The Guillotine), the final game of the Italian TV quiz show L'Eredità. Our system relies on linguistic resources and artificial intelligence and achieves better results than human players (and competitors of L'Eredità too)....
Chapter
This paper describes Il mago della Ghigliottina, a bot which took part in the Ghigliottin-AI task of the Evalita 2020 evaluation campaign. The aim is to build a system able to solve the TV game “La Ghigliottina”. Our system has already participated in the Evalita 2018 task NLP4FUN. Compared to that occasion, it improved its accuracy from 61% to 68....
Chapter
Evaluating Artificial Players for the Language Game “La Ghigliottina” (Ghigliottin-AI) task is one of the tasks organized in the context of the 2020 EVALITA edition, a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. Ghigliottin-AI participants are asked to build an artificial player able...
Conference Paper
Full-text available
In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to...
Conference Paper
Full-text available
We present an architecture for crowdsourcing language resources from language learners and a prototype implementation of it as a vocabulary trainer. The vocabulary trainer relies on lexical resources from the ConceptNet semantic network to generate exercises while using the learners' answers to improve the resources used for the exercise generation...
Conference Paper
Full-text available
The paper describes UNIOR4NLP a system developed to solve "La Ghigliottina" game which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign. The system is the best performing one in the competition and achieves better results than human players.
Conference Paper
Full-text available
In this paper, we present the enetCollect1 COST Action, a large network project, which aims at initiating a new Research and Innovation (R&I) trend on combining the well-established domain of language learning with recent and successful crowdsourcing approaches. We introduce its objectives, and describe its organization. We then present the Italian...
Conference Paper
ll contributo descrive il sistemaUNIOR4NLP, sviluppato per risolvere il gioco “La Ghigliottina”, che ha partecipato alla sfida NLP4FUN della campagna di valutazione Evalita 2018. Il sistema risulta il migliore della competizione e ha prestazioni più elevate rispetto agli umani.
Book
Full-text available
Multiword expressions (MWEs) are known as a “pain in the neck” due to their idiosyncratic behaviour. While some categories of MWEs have been largely studied, verbal MWEs (VMWEs) such as to take a walk, to break one’s heart or to turn off have been relatively rarely modelled. We describe an initiative meant to bring about substantial progress in und...
Article
Full-text available
The present report summarizes an exploratory study which we carried out in the context of the COST Action IS1310 "Reassembling the Republic of Letters, 1500-1800", and which is relevant to the activities of Working Group 3 "Texts and Topics" and Working Group 2 "People and Networks". In this study we investigated the use of Natural Language Process...
Conference Paper
Full-text available
This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multi-word datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult t...
Conference Paper
Full-text available
English. The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT tra...
Article
Full-text available
The aim of this paper is to present PARSEME, a COST Action devoted to the issue of Multiword Expressions in parsing and in linguistic resources (corpora, lexicons). This is a “meta-paper” intended to be the main citation point for any future work referring to PARSEME: it does not describe in detail any single result of the Action, but rather summar...
Article
Full-text available
In this paper, we present the first incremental parser for Tree Substitution Grammar (TSG). A TSG allows arbitrarily large syntactic fragments to be combined into complete trees; we show how constraints (including lexicalization) can be imposed on the shape of the TSG fragments to enable incremental processing. We propose an efficient Earley-based...
Article
Federico Sangati onderzocht het leren van syntactische boomstructuren aan de hand van generalisaties over geannoteerde corpora. Hij richtte zich op verschillende probabilistische modellen, met drie verschillende representaties. Sangati formuleerde een algemeen kader (framework) voor het definiëren van generatieve modellen van syntaxis. In elk model...
Conference Paper
We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments,...
Conference Paper
The growing availability of spoken lan-guage corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel approach for construct-ing speech motor exercises, based on lin-guistic knowledge extracted from spoken language corpora. In our study with the Dutch Spoken Corpus, syllabi...
Conference Paper
We present a probabilistic model extension to the Tesnière Dependency Structure (TDS) framework formulated in (Sangati and Mazza, 2009). This representation incorporates aspects from both constituency and dependency theory. In addition, it makes use of junction structures to handle coordination constructions. We test our model on parsing the Englis...
Article
We present a general framework for dependency parsing of Italian sentences based on a combination of discriminative and genera-tive models. We use a state-of-the-art discriminative model to obtain a k -best list of candidate structures for the test sentences, and use the gen-erative model to compute the probability of each candidate, and select the...
Article
Full-text available
In this paper we describe FragmentSeeker, a tool which is capable to identify all those tree constructions which are recurring multiple times in a large Phrase Structure treebank. The tool is based on an efficient kernel-based dynamic algorithm, which compares every pair of trees of a given treebank and computes the list of fragments which they bot...
Conference Paper
Full-text available
We present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions on the role of heads in natural language syntax. Starting point of our approach is the observation that a head-annotated treebank defines a unique lexicalized tree substitution grammar. This allows us to go back and forth between th...
Conference Paper
Full-text available
We propose a framework for dependency parsing based on a combination of dis- criminative and generative models. We use a discriminative model to obtain a k- best list of candidate parses, and subse- quently rerank those candidates using a generative model. We show how this ap- proach allows us to evaluate a variety of generative models, without nee...
Article
During the last decade, the Computational Linguistics community has shown an increased interest in Dependency Treebanks. Several groups have developed new annotated corpora using dependency representation, while other people have proposed several automatic conversion algorithms to trans-form available Phrase Structure (PS) treebanks into Dependency...
Article
We present a simplified Data-Oriented Parsing (DOP) formalism for learning the constituency structure of Italian sentences. In our approach we try to simplify the original DOP methodology by constraining the number and type of fragments we extract from the training corpus. We provide some examples of the types of constructions that occur more often...
Article
Towards simpler tree substitution grammars Federico Sangati Abstract: In this thesis we will investigate several supervised methods of learning the syntactic structure of natural languages. Supervised learning is one of several machine learning paradigms. It differs from the unsupervised methodologies in the fact that it learns from a number of exi...

Network

Cited By

Projects

Projects (2)
Project
The PARSEME-IT project aims at improving linguistic representativeness, precision, robustness and computational efficiency of Natural Language Processing (NLP) applications, in particular for the Italian language. The project focuses on a major bottleneck of these applications: MultiWord Expressions (MWEs), that is, groups of words that must be treated as units at some level of linguistic processing, such as hot dog, hard disk, kick the bucket, United Nations and pay attention. The main aim of the project is to bridge the gap between linguistic precision and computational efficiency in NLP applications by investigating the syntactic and semantic representation of MWEs in language resources, the integration of MWE analysis in syntactic parsing  and translation technology. Expected deliverables include mainly enhanced monolingual language resources (lexicons, grammars and annotated corpora) in Italian or multilingual linguistic resources with the Italian language. This project is a spin-off of PARSEME, an European IC1207 COST action on the same topic.
Project
@EmojiWorldBot (https://telegram.me/emojiworldbot) is a multilingual dictionary that uses Emoji as a pivot for contributors among dozens of diverse languages. Currently we support emoji-to-word and word-to-emoji for 72 languages imported from the unicode tables (see http://www.unicode.org/cldr/charts/29/annotations). This is just a start! Future releases will enable you to help us: 1. Add new languages 2. Add new terms for current languages (including country names for national flags) 3. Match language-to-language: using this bot to crowdsource (via gamification techniques) very accurate bilingual dictionaries between any two languages EmojiWorldBot is a free public service produced by Federico Sangati (Netherlands), Martin Benjamin and Sina Mansour at Kamusi Project International and EPFL (Switzerland), Francesca Chiusaroli at University of Macerata (Italy) (http://docenti.unimc.it/f.chiusaroli), and Johanna Monti at University of Naples “L’Orientale” (Italy). @EmojiWorldBot version 0.91