Featured projects (1)

ALP is an Arabic linguistic pipeline that performs the following tasks: * Arabic words tokenization (segmentation); * Arabic words POS-tagging; * Arabic lemmatization; * Arabic named entity recognition; * Arabic chunking. The tool is free for research and personal tool. The tool web page: http://arabicnlp.pro The tool is free for research and personal tool.

Featured research (5)

Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of a target reality when modelled, by independent parties, in different databases, schemas and/ or data. We argue that the mere encoding of variance, while being necessary, is not sufficient enough to deal with the problem of representational heterogeneity, given that it is also necessary to encode the unifying basis on which such variance is manifested. To that end, this paper introduces a notion of Representation Heterogeneity in terms of the co-occurrent notions of Representation Unity and Representation Diversity. We have representation unity when two heterogeneous representations model the same target reality, representation diversity otherwise. In turn, this paper also highlights how these two notions get instantiated across the two layers of any representation, i.e., Language and Knowledge.
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.
Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular considering the large number of machine learning applications, not least those based on Deep Neural Networks, that have been trained on these datasets. In this paper we posit the problem to be the lack of a knowledge representation (KR) methodology providing the foundations for the construction of these ground truth benchmark datasets. Accordingly, we propose a solution articulated in three main steps: (i) deconstructing the object recognition process in four ordered stages grounded in the philosophical theory of teleosemantics; (ii) based on such stratification, proposing a novel four-phased methodology for organizing objects in classification hierarchies according to their visual properties; and (iii) performing such classification according to the faceted classification paradigm. The key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae, and not from linguistically grounded properties. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments.
We base our work on the teleosemantic modelling of concepts as abilities implementing the distinct functions of recognition and classification. Accordingly, we model two types of concepts - substance concepts suited for object recognition exploiting visual properties, and classification concepts suited for classification of substance concepts exploiting linguistically grounded properties. The goal in this paper is to demonstrate that object recognition can be construed as classification via visual properties, as distinct from work in mainstream computer vision. Towards that, we present an object recognition process based on Ranganathan's four-phased faceted knowledge organization process, grounded in the teleosemantic distinctions of substance concept and classification concept. We also briefly introduce the ongoing project MultiMedia UKC, whose aim is to build an object recognition resource following our proposed process.
We assume that substances in the world are represented by two types of concepts, namely substance concepts, as originally introduced by Ruth Millikan, and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification. Based on this distinction, we introduce a general methodology for building lexico semantic hierarchies of substance concepts, where nodes are annotated with the media, e.g., videos or photos, from which substance concepts are extracted, and are associated with the corresponding classification concepts. The methodology is based on Ranganathan’s original faceted approach, contextualized to the problem of classifying substance concepts. The key novelty is that the hierarchy is built exploiting the visual properties of substance concepts, while the linguistically defined properties of classification concepts are only used to describe substance concepts. The validity of the approach is exemplified by providing some highlights of an ongoing project whose goal is to build a large scale multimedia multilingual concept hierarchy

Lab head

Fausto Giunchiglia
  • Department of Information Engineering and Computer Science

Members (7)

Gábor Bella
  • Università degli Studi di Trento
Abed Alhakim Freihat
  • Università degli Studi di Trento
Mayukh Bagchi
  • Università degli Studi di Trento
Khuyagbaatar Batsuren
  • National University of Mongolia
Alessio Zamboni
  • Università degli Studi di Trento
Yamini Chandrashekar
  • Università degli Studi di Trento
Samuele Conti
  • Università degli Studi di Trento