Fausto Giunchiglia

Fausto Giunchiglia
Università degli Studi di Trento | UNITN · Department of Information Engineering and Computer Science

About

560
Publications
72,288
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,197
Citations
Citations since 2016
87 Research Items
3045 Citations
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400

Publications

Publications (560)
Preprint
Full-text available
Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the...
Chapter
More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos (Not to be conf...
Preprint
Full-text available
More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general pu...
Article
Full-text available
We introduce and study knowledge drift (KD), a special form of concept drift that occurs in hierarchical classification. Under KD the vocabulary of concepts, their individual distributions, and the is-a relations between them can all change over time. The main challenge is that, since the ground-truth concept hierarchy is unobserved, it is hard to...
Preprint
Full-text available
One of the major barriers to the training of algorithms on knowledge graph schemas, such as vocabularies or ontologies, is the difficulty that scientists have in finding the best input resource to address the target prediction tasks. In addition to this, a key challenge is to determine how to manipulate (and embed) these data, which are often in th...
Preprint
Full-text available
Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of a target reality when modelled, by independent parties, in different databases, schemas and/ or data. We argue that the mere encoding of variance, while being necessary, is not sufficient enough to deal with the problem of representational hete...
Conference Paper
Federated Learning (FL) is an emerging privacy-aware machine learning technique that applies successfully to the collaborative learning of global models for Human Activity Recognition (HAR). As of now, the applications of FL for HAR assume that the data associated with diverse individuals follow the same distribution. However, this assumption is im...
Preprint
We propose a model of the situational context of a person and show how it can be used to organize and, consequently, reason about massive streams of sensor data and annotations, as they can be collected from mobile devices, e.g. smartphones, smartwatches or fitness trackers. The proposed model is validated on a very large dataset about the everyday...
Preprint
Full-text available
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian,...
Preprint
Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency. ProtoPNets compute predictions based on similarity to class-specific part-prototypes learned to recognize parts of training examples, making it easy to faithfully determine what examples...
Preprint
Full-text available
Metonymy is regarded as a universally shared cognitive phenomenon; as such, humans are taken to effortlessly produce and comprehend metonymic senses. However, experimental studies on metonymy have been focused on Western societies, and the linguistic data backing up claims of universality has not been large enough to provide conclusive evidence. We...
Preprint
We focus on the development of AIs which live in lifelong symbiosis with a human. The key prerequisite for this task is that the AI understands - at any moment in time - the personal situational context that the human is in. We outline the key challenges that this task brings forth, namely (i) handling the human-like and ego-centric nature of the t...
Preprint
Full-text available
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data i...
Chapter
The increasing use of smart devices allows us to extract massive streams of data, e.g., sensor streams, questionnaires, answers, annotations, etc. This information is crucial for the recognition of people’s behaviours and habits. The main challenge is how to represent and organize such large scale, complex and heterogeneous data streams. This repre...
Preprint
Full-text available
This paper describes a method to enrich lexical resources with content relating to linguistic diversity, based on knowledge from the field of lexical typology. We capture the phenomenon of diversity through the notions of lexical gap and language-specific word and use a systematic method to infer gaps semi-automatically on a large scale. As a first...
Article
Full-text available
Mobile Crowd Sensing (MCS) is a novel IoT paradigm where sensor data, as collected by the user’s mobile devices, are integrated with user-generated content, e.g., annotations, self-reports, or images. While providing many advantages, the human involvement also brings big challenges, where the most critical is possibly the poor quality of human-prov...
Preprint
Full-text available
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC...
Article
Full-text available
We present CogNet, a large-scale, automatically-built database of sense-tagged cognates—words of common origin and meaning across languages. CogNet is continuously evolving: its current version contains over 8 million cognate pairs over 338 languages and 35 writing systems, with new releases already in preparation. The paper presents the algorithm...
Preprint
Full-text available
Diversity-aware platform design is a paradigm that responds to the ethical challenges of existing social media platforms. Available platforms have been criticized for minimizing users' autonomy, marginalizing minorities, and exploiting users' data for profit maximization. This paper presents a design solution that centers the well-being of users. I...
Preprint
Full-text available
Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The conseque...
Chapter
Full-text available
Here, the psychological construct of transitional objects is presented, along with animism and suspension of disbelief in order to provide guidelines and inspiration for Human-Robot Interaction directed to young users. In particular, we will focus on bedtime and children sleeping alone, which is critical after the separation from their mother. Also...
Preprint
Full-text available
We base our work on the teleosemantic modelling of concepts as abilities implementing the distinct functions of recognition and classification. Accordingly, we model two types of concepts - substance concepts suited for object recognition exploiting visual properties, and classification concepts suited for classification of substance concepts explo...
Conference Paper
Full-text available
Measuring the quality of lexical-semantic resources is a challenging problem. In this paper, we describe a general approach for quality evaluation in lexical-semantic resources in terms of the quality of their synsets. We also introduce a complete definition for the quality of lexical-semantic resources as a set of synset in-correctness, incomplete...
Conference Paper
Full-text available
With the increase of the lexical-semantic resources built over time, lexicon content quality has gained significant attention from Natural Language Processing experts such as lexicographers and linguists. Estimating lexicon quality components like synset lemmas, synset gloss, or synset relations are challenging research problems for Natural Languag...
Article
Full-text available
Lexical Semantics is concerned with how words encode mental representations of the world, i.e., concepts. We call this type of concepts, classification concepts. In this paper, we focus on Visual Semantics, namely, on how humans build concepts representing what they perceive visually. We call this second type of concepts, substance concepts. As sho...
Preprint
We are concerned with debugging concept-based gray-box models (GBMs). These models acquire task-relevant concepts appearing in the inputs and then compute a prediction by aggregating the concept activations. This work stems from the observation that in GBMs both the concepts and the aggregation function can be affected by different bugs, and that c...
Preprint
We are interested in dealing with the heterogeneity of Knowledge bases (KBs), e.g., ontologies and schemas, modeled as sets of entity types (etypes), e.g., person, where each etype is associated with a set of properties, e.g., age or height, via an inheritance hierarchy. A huge literature exists on this topic. A common approach is to model KBs as g...
Conference Paper
We assume that substances in the world are represented by two types of concepts, namely substance concepts, as originally introduced by Ruth Millikan, and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification. Based on this distinction, we introduce a general methodology for building...
Chapter
The main goal of this paper is to evaluate knowledge base schemas, modeled as a set of entity types, each such type being associated with a set of properties, according to their focus. We model the notion of focus as “the state or quality of being relevant in storing and retrieving information”. This definition of focus is adapted from the notion o...
Chapter
Full-text available
Lexical similarity data, quantifying the “proximity” of languages based on the similarity of their lexicons, has been increasingly used to estimate the cross-lingual reusability of language resources, for tasks such as bilingual lexicon induction or cross-lingual transfer. Existing similarity data, however, originates from the field of comparative...
Article
Full-text available
Knowledge graph-based data integration is a practical methodology for heterogeneous legacy database-integrated service construction. However, it is neither efficient nor economical to build a new cross-domain knowledge graph on top of the schemas of each legacy database for the specific integration application rather than reusing the existing high-...
Conference Paper
Full-text available
Large-scale morphological databases provide essential input to a wide range of NLP applications. Inflectional data is of particular importance for morphologically rich (agglutinative and highly inflecting) languages, and derivations can be used, e.g. to infer the semantics of out-of-vocabulary words. Extending the scope of state-of-the-art multilin...
Article
Full-text available
Various studies have investigated the predictability of different aspects of human behavior such as mobility patterns, social interactions, and shopping and online behaviors. However, the existing researches have been often limited to a single or to the combination of few behavioral dimensions, and they have adopted the perspective of an outside ob...
Conference Paper
Diversity-aware platform design is a paradigm that responds to the ethical challenges of existing social media platforms. Available platforms have been criticized for minimizing users' autonomy, marginalizing minorities, and exploiting users' data for profit max-imization. This paper presents a design solution that centers the well-being of users....
Conference Paper
Full-text available
The main goal of this paper is to evaluate knowledge base schemas, modeled as a set of entity types, each such type being associated with a set of properties, according to their focus. We model the notion of focus as "the state or quality of being relevant in storing and retrieving information". This definition of focus is adapted from the notion o...
Article
Full-text available
In this paper, we propose an architecture supporting online open communities , where by open communities, we mean communities where previously unknown people can join, possibly for a limited amount of time. The fundamental question that we address is “how we can make sure that an individual’s requirements are taken into consideration by the communi...
Article
The first FATE Winter School, organized by the Cyprus Center for Algorithmic Transparency (CyCAT) provided a forum for both students as well as senior researchers to examine the complex topic of Fairness, Accountability, Transparency and Ethics (FATE). Through a program that included two invited keynotes, as well as sessions led by CyCAT partners a...
Preprint
Full-text available
We assume that substances in the world are represented by two types of concepts, namely substance concepts and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification. Based on this distinction, we introduce a general methodology for building lexico-semantic hierarchies of substance con...
Preprint
Full-text available
We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annot...
Preprint
Full-text available
It is a fact that, when developing a new application, it is virtually impossible to reuse, as-is, existing datasets. This difficulty is the cause of additional costs, with the further drawback that the resulting application will again be hardly reusable. It is a negative loop which consistently reinforces itself and for which there seems to be no w...
Preprint
Full-text available
As the role of algorithmic systems and processes increases in society, so does the risk of bias, which can result in discrimination against individuals and social groups. Research on algorithmic bias has exploded in recent years, highlighting both the problems of bias, and the potential solutions, in terms of algorithmic transparency (AT). Transpar...
Preprint
Full-text available
Mitigating bias in algorithmic systems is a critical issue drawing attention across communities within the information and computer sciences. Given the complexity of the problem and the involvement of multiple stakeholders, including developers, end-users and third-parties, there is a need to understand the landscape of the sources of bias, and the...
Chapter
Full-text available
The aim of transfer learning is to reuse learnt knowledge across different contexts. In the particular case of cross-domain transfer (also known as domain adaptation), reuse happens across different but related knowledge domains. While there have been promising first results in combining learning with symbolic knowledge to improve cross-domain tran...
Conference Paper
Full-text available
We set out to uncover the unique grammatical properties of an important yet so far under-researched type of natural language text: that of short labels typically found within structured datasets. We show that such labels obey a specific type of abbreviated grammar that we call the Language of Data, with properties significantly different from the k...
Conference Paper
Full-text available
As medical research becomes ever finer-grained, experiments require healthcare data in quantities that single countries cannot provide. Cross-jurisdictional data collection remains, however, extremely challenging due to the diverging legal, professional, linguistic, normative, and technological contexts of the participating countries. Medical data...
Conference Paper
Full-text available
The aim of transfer learning is to reuse learnt knowledge across different contexts. In the particular case of cross-domain transfer (also known as domain adaptation), reuse happens across different but related knowledge domains. While there have been promising first results in combining learning with symbolic knowledge to improve cross-domain tran...
Conference Paper
Full-text available
We present a new wordnet resource for Scottish Gaelic, a Celtic minority language spoken by about 60,000 speakers, most of whom live in Northwestern Scotland. The wordnet contains over 15 thousand word senses and was constructed by merging ten thousand new,high-quality translations, provided and validated by language experts, with an existing wordn...
Chapter
While normative systems have excelled at addressing issues such as coordination and cooperation, they have left a number of open challenges. The first is how to reconcile individual goals with community goals, without breaching the individual’s privacy. The evolution of norms driven by individuals’ behaviour or argumentation have helped take the in...
Article
Full-text available
Natural language understanding is a key task in a wide range of applications targeting data interoperability or analytics. For the analysis of domain-specific data, specialised knowledge resources (terminologies, grammars, word vector models, lexical databases) are necessary. The heterogeneity of such resources is, however, a major obstacle to thei...
Conference Paper
Full-text available
The paper focuses on two pivotal cognitive functions of both natural and AI agents, namely classification and identification. Inspired from the theory of teleosemantics, itself based on neuroscientific results, we show that these two functions are complementary and rely on distinct forms of knowledge representation. We provide a new perspective on...
Preprint
This paper presents ALP, an entirely new linguistic pipeline for natural language processing of text in Modern Standard Arabic. Contrary to the conventional pipeline architecture , we solve common NLP operations of word segmentation, POS tagging, and named entity recognition as a single sequence labeling task. Based on this single component , we al...
Presentation
Full-text available
Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to th...
Conference Paper
Full-text available
Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key pre processing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity due to t...
Article
Full-text available
Multilingual lexico-semantic resources are used in different semantic services such as meaning extraction or data integration and linking, which are essential for the development of real-world applications. However, their use is hampered by the lack of maintenance and quality control mechanisms over their content. The Universal Knowledge Core (UKC)...