
Floriana EspositoUniversità degli Studi di Bari Aldo Moro | Università di Bari · Department of Computer Science
Floriana Esposito
professor
About
620
Publications
81,222
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,460
Citations
Introduction
Esposito Floriana worked since 1974 at the Department of Computer Science, Università degli Studi di Bari Aldo Moro, searching since the beginning in the field of Pattern Recognition, Artificial Intelligence, Machine Learning and Data Mining. On 1989 she founded the Laboratory LACAM, Uniba. She retired on november 2018. Her research interests include: 1) Machine Learning methods and systems based on symbolic and numerical approaches. 2) Integration in first order logic of probabilistic methods for the development of reasoning techniques under uncertainty. 3) Inductive Logic Programming: Incremental and Multistrategy Learning. 4) Inductive methods for ontologies and Semantic Web representations. 5) Sum Product Networks and probabilistic architectures for Deep Learning.
Additional affiliations
October 1994 - present
May 1994 - June 1994
November 1984 - October 1994
Publications
Publications (620)
The data descriptions of the units are called "symbolic" when they are more complex than standard ones due to the fact that they contain internal variation and are structured. Symbolic data arise from many sources, for instance in order to summarize huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explan...
A totally semantic measure is presented which is able to calculate a similarity value between concept descriptions and also between concept description and individual or between individuals expressed in an expressive description logic. It is applicable on symbolic descriptions although it uses a numeric approach for the calculus. Considering that D...
In this paper, we address the problem of retrospectively pruning
decision trees induced from data, according to a top-down approach. This
problem has received considerable attention in the areas of pattern
recognition and machine learning, and many distinct methods have been
proposed in literature. We make a comparative study of six well-known
prun...
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has be...
Positive-Unlabeled (PU) learning works by considering a set of positive samples, and a (usually larger) set of unlabeled ones. This challenging setting requires algorithms to cleverly exploit dependencies hidden in the unlabeled data in order to build models able to accurately discriminate between positive and negative samples. We propose to exploi...
Sum-Product Networks (SPNs) are recently introduced deep tractable probabilistic models by which several kinds of inference queries can be answered exactly and in a tractable time. Up to now, they have been largely used as black box density estimators, assessed only by comparing their likelihood scores only. In this paper we explore and exploit the...
Sum-Product Networks (SPNs) are recently introduced deep probabilistic models providing exact and tractable
inference. SPNs have been successfully employed in several application domains, from computer vision to natural language
processing, as accurate density estimators. However, learning their structure and parameters from high dimensional data p...
Multi-label classification (MLC) is a challenging task in machine learning consisting in the prediction of multiple labels associated with a single instance. Promising approaches for MLC are those able to capture label dependencies by learning a single probabilistic model—differently from other competitive approaches requiring to learn many models....
slides presented at EKAW 2018
Slides presented at EKAW 2018
The paper presents the ultimate version of a concept learning system which can support typical ontology construction / evolution tasks through the induction of class expressions from groups of individual resources labeled by a domain expert.
Stating the target task as a search problem, a Foil-like algorithm was devised based on the employment of r...
A prominent class of supervised methods for the representations adopted in the context of the Web of Data are designed to solve concept learning problems. Such methods aim at approximating an intensional definition for a target concept from a set of individuals of a target knowledge base. In this scenario, most of the well-known solutions exploit a...
Multi-label classification (MLC) is a challenging task in machine learning consisting in the prediction of multiple labels associated with a single instance. Promising approaches for MLC are those able
to capture label dependencies by learning a single probabilistic model
differently from other competitive approaches requiring to learn many models....
Sum-Product Networks (SPNs) are deep tractable probabilistic models by which several
kinds of inference queries can be answered exactly and in a tractable time. They have been
largely used as black box density estimators, assessed by comparing their likelihood scores
on different tasks. In this paper we explore and exploit the inner representations...
Sum-Product Networks (SPNs) are a deep probabilistic architecture that up to now has been successfully employed for tractable inference. Here, we extend their scope towards unsupervised representation learning: we encode samples into continuous and categorical embeddings and show that they can also be decoded back into the original input space by l...
While all kinds of mixed data---from personal data, over panel and scientific data, to public and commercial data---are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson,...
Natural Language Processing techniques are of utmost importance for the proper management of Digital Libraries. These techniques are based on language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is costly, time-consuming and error-prone, it would be desirable to learn these resources aut...
Positive-Unlabeled (PU) learning works by considering a set of positive samples, and a (usually larger) set of unlabeled ones. This challenging setting requires algorithms to cleverly exploit dependencies hidden in the unlabeled data in order to build models able to accurately discriminate between positive and negative samples. We propose to exploi...
Sum-Product Networks (SPNs) are a deep probabilistic architecture that up to now has been successfully employed for tractable inference. Here, we extend their scope towards un-supervised representation learning: we encode samples into continuous and categorical embeddings and show that they can also be decoded back into the original input space by...
While all kinds of mixed data—from personal data, over
panel and scientific data, to public and commercial data—are
collected and stored, building probabilistic graphical models
for these hybrid domains becomes more difficult. Users spend
significant amounts of time in identifying the parametric form
of the random variables (Gaussian, Poisson, Logi...
In the context of the Semantic Web, assigning individuals to their respective classes is a fundamental reasoning service. It has been shown that, when purely deductive reasoning falls short, this problem can be solved as a prediction task to be accomplished through inductive classification models built upon the statistical evidence elicited from on...
Natural Language Processing techniques are of utmost importance for the proper management of Digital Libraries. These techniques are based on language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is costly, time-consuming and error-prone, it would be desirable to learn these resources aut...
Cutset Networks (CNets) are density estimators leveraging context-specific independencies recently introduced to provide exact inference in polynomial time. Learning a CNet is done by firstly building a weighted probabilistic OR tree and then estimating tractable distributions as its leaves. Specifically, selecting an optimal OR split node requires...
The Web of Data, which is one of the dimensions of the Semantic Web (SW), represents a tremendous source of information, which motivates the increasing attention to the formalization and application of machine learning methods for solving tasks such as concept learning, link prediction, inductive instance retrieval in this context. However, the Web...
Discussions on social Web platforms carry a lot of information which is more and more difficult to analyze. Given a virtual community of users that discuss a particular topic of interest, an important task is to extract a model of the whole debate in order to automatically evaluate what are the most reliable claims. This paper proposes to approach...
In the phase of evaluation of accepted arguments, one may find that not all the arguments of discussion are essential when drawing conclusions. Especially when the cardinality of the set of arguments is high, the task of identifying the most relevant arguments of the whole discussion in huge Argument Systems through the analysis of its synthesis ma...
This paper aims at studying complex behaviors of search and rescue robots in emergency situations. We used as environment of the simulation NetProLogo in order to: i) build a simulated scenario with robots, humans beings, and emergency exits, ii) endow robots with reasoning rules, and iii) evaluate robots behavior on the basis of two search strateg...
Sum-Product Networks (SPNs) are recent deep probabilistic models providing exact and tractable inference. SPNs have been successfully employed as density estimators in several application domains. However , learning an SPN from high dimensional data still poses a challenge in terms of time complexity. This is due to the high cost of determining ind...
Several high-level tasks in the management of Digital Libraries require the application of Natural Language Processing (NLP) techniques. In turn, most NLP solutions are based on linguistic resources that are costly to produce, and so motivate research for automated ways to build them. In particular, Language Identification is a crucial NLP task, th...
While all kinds of mixed data -from personal data, over panel and scientific data, to public and commercial data- are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Lo...
Cutset Networks (CNets) are density estimators leveraging context-specific independencies recently introduced to provide exact inference in polynomial time. Learning a CNet is done by firstly building a weighted probabilistic OR tree and then estimating tractable distributions as its leaves. Specifically, selecting an optimal OR split node requires...
Cutset Networks (CNets) are density estimators leveraging context-specific independencies recently introduced to provide exact inference in polynomial time. Learning a CNet is done by firstly building a weighted probabilistic OR tree and then estimating tractable distributions as its leaves. Specifically, selecting an optimal OR split node requires...
Recent studies suggest that robots play an important role to cope Autistic Spectrum Disorder (ASD). This paper presents a multimodal interface based on a multilevel treatment protocol customized to improve eye contact, joint attention, and imitation. An evaluation of the system has been performed involving 6 high functioning children with autism sp...
In addition to the classical exploitation as a means for checking process enactment conformance, process models may be used to predict which activities will be carried out next. The prediction performance may provide indirect indications on the correctness and reliability of a process model. This paper proposes a strategy for activity prediction us...
Medical diagnosis in general is a hard task, requiring significant skill and expertise. Psychological diagnosis, in particular, is peculiar for several reasons: since the illness is mental rather than physical, no instrumental measurements can be done, more subjectivity is involved in the diagnostic process, and there is more chance of comorbidity....
Computational models of argument aims at engaging argu-mentation-related activities with human users. In the present work we propose a new generalized version of abstract argument system, called Trust-affected Bipolar Weighted Argumentation Framework (T-BWAF). In this framework, two mainly interacting components are exploited to reason about the ac...
Despite the benefits deriving from explicitly modeling concept disjointness to increase the quality of the ontologies, the number of disjointness axioms in vocabularies for the Web of Data is still limited, thus risking to leave important constraints underspecified. Automated methods for discovering these axioms may represent a powerful modeling to...
While nowadays most newspapers are born-digital (typeset directly in PDF), up to a few years ago they were only available in printed form. Digitizing the paper artifact to make it available in digital libraries yields a sequence of raster images of the pages that make up the documents. Such images consist of just matrices of pixels, and carry no ex...
The possibility for people to leave comments in blogs and forums on the Internet allows to study their attitude (in terms of valence or even of specific feelings) on various topics. For some digital libraries this may be a precious opportunity to understand how their content is perceived by their users and, as a consequence, to suitably direct thei...
Several high-level tasks in the management of Digital Libraries require the application of Natural Language Processing (NLP) techniques. In turn, most NLP solutions are based on linguistic resources that are costly to produce, and so motivate research for automated ways to build them. In particular, Language Identication is a crucial NLP task, that...
Sum-Product Networks (SPNs) are deep density estimators allowing exact and tractable inference. While up to now SPNs have been employed as black-box inference machines, we exploit them as feature extractors for unsupervised Representation Learning. Representations learned by SPNs are rich probabilistic and hierarchical part-based features. SPNs con...
This book constitutes the refereed proceedings of the 16th International Conference of the Italian Association for Artificial Intelligence, AI*IA 2017, held in Bari, Italy, in November 2017.
The 37 full papers presented were carefully reviewed and selected from 91 submissions. The papers are organized in topical sections on applications of AI; nat...
In addition to the classical exploitation as a means for checking process enactment conformance, process models may be precious for making various kinds of predictions about the process enactment itself (e.g., which activities will be carried out next, or which of a set of candidate processes is actually being executed). These predictions may be mu...
Several studies suggest that robots can play a relevant role to address Autistic Spectrum Disorder (ASD). This paper presents a humanoid social robot-assisted behavioral system based on a therapeutic multilevel treatment protocol customized to improve eye contact, joint attention, symbolic play, and basic emotion recognition. In the system, the rob...
In the context of the Web of Data, plenty of properties may be used for linking resources to other resources but also to literals that specify their attributes. However the scale and inherent nature of the setting is also characterized by a large amount of missing and incorrect information. To tackle these problems, learning models and rules for pr...
In this paper, we tackle the problem of clustering individual resources in the context of the Web of Data, that is characterized by a huge amount of data published in a standard data model with a well-defined semantics based on Web ontologies. In fact, clustering methods offer an effective solution to support a lot of complex related activities, su...
Building a diversied portfolio is an appealing strategy in the analysis of stock market dynamics. It aims at reducing risk in market capital investments. Grouping stocks by similar latent trend can be cast into a clustering problem. The classical K-Means clustering algorithm does not fit the task of financial data analysis. Hence, we investigate No...
In this work, we tackle the problem of Multi-Label Classification (MLC) by using Cutset Networks (CNets), weighted probabilistic model trees, recently proposed as tractable probabilistic models for discrete distributions. We employ CNets to perform Most Probable Explanation (MPE) inference exactly and efficiently and we improve a state-of-the-art s...
Probabilistic models learned as density estimators can be exploited in representation learning beside being toolboxes used to answer inference queries only. However, how to extract useful representations highly depends on the particular model involved. We argue that tractable inference, i.e. inference that can be computed in polynomial time, can en...
In this work, we tackle the problem of predicting unknown values of numeric features expressed as datatype properties. The task can be cast as a regression problem for which suitable solutions have been devised, for instance, in the related context of RDBs. However, solving such problems singularly does not allow to exploit likely correlations exis...
Author identification is a hot topic, especially in the Internet age. Following our previous work in which we proposed a novel approach to this problem, based on relational representations that take into account the structure of sentences, here we present a tool that computes and visualizes a numerical and graphical characterization of the authors/...
We focus on the problem of predicting missing links in large Knowledge Graphs (KGs), so to discover new facts. Over the last years, latent factor models for link prediction have been receiving an increasing interest: they achieve state of-the-art accuracy in link prediction tasks, while scaling to very large KGs. However, KGs are often endowed with...
The possibility for people to leave comments in blogs and forums on the Internet allows to study their attitude (in terms of va-lence or even of specific feelings) on various topics. For some digital libraries this may be a precious opportunity to understand how their content is perceived by their users and, as a consequence, to suitably direct the...
While nowadays most newspapers are born-digital (typeset directly in PDF), up to a few years ago they were only available in printed form. Digitizing the paper artifact to make it available in digital libraries yields a sequence of raster images of the pages that make up the documents. Such images consist of just matrices of pixels, and carry no ex...
Knowledge Graphs (KGs) are a widely used formalism for representing knowledge in the Web of Data. We focus on the problem of link prediction, i.e. predicting missing links in large knowledge graphs, so to discover new facts about the world. Representation learning models that embed entities and relation types in continuous vector spaces recently we...
Current tools to create OWL-S annotations have been designed starting from the knowledge engineer’s point of view. Unfortunately, the formalisms underlying Semantic Web languages are often obscure to the developers of Web services. To bridge this gap, it is desirable that developers are provided with suitable tools that do not necessarily require k...
The availability on the Internet of huge amounts of blog posts, messages and comments allows to study the attitude of people on various topics. Sentiment Analysis, Opinion Mining and Emotion Analysis denote the area of research in Computer Science aimed at studying, analyzing and classifying text documents based on the underlying opinions expressed...
Integrated Tourism can be defined as the kind of tourism which is explicitly linked to the localities in which it takes place and, in practical terms, has clear connections with local resources, activities, products, production and service industries, and a participatory local community. In this paper we report our experience in applying Artificial...
In symbolic Machine Learning, the incremental setting allows to refine/revise the available model when new evidence proves it is inadequate, instead of learning a new model from scratch. In particular, specialization operators allow to revise the model when it covers a negative example. While specialization can be obtained by introducing negated pr...
The rising interest around tractable Probabilistic Graphical Models is due to the guarantees on inference feasibility they provide. Among them, Cutset Networks (CNets) have recently been introduced as models embedding Pearl's cutset conditioning algorithm in the form of weighted probabilistic model trees with tree-structured models as leaves. Learn...