
Fabio Fassetti- Ph. D.
- University of Calabria
Fabio Fassetti
- Ph. D.
- University of Calabria
About
85
Publications
7,355
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
989
Citations
Current institution
Additional affiliations
January 2007 - present
Education
May 2004 - May 2004
Publications
Publications (85)
Explainable AI refers to techniques by which the reasons underlying decisions taken by intelligent artifacts are single out and provided to users. Outlier detection is the task of individuating anomalous objects within a given data population they belong to. In this paper we propose a new technique to explain why a given data object has been single...
textbf{Latent}}\varvec{Out}}$$ Latent Out is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely ( Variational ) Autoencoders , GANomaly and ANOGan architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order...
Explainable AI refers to techniques by which the reasons underlying decisions taken by intelligent artifacts are single out and provided to users. Outlier detection is the task of individuating anomalous objects within a given data population they belong to. In this paper we propose a new technique to explain why a given data object has been single...
Reconstruction error-based neural architectures constitute a classical deep learning approach to anomaly detection which has shown great performances. It consists in training an Autoencoder to reconstruct a set of examples deemed to represent the normality and then to point out as anomalies those data that show a sufficiently large reconstruction e...
Motivation
An interesting problem is to study how gene co-expression varies in two different populations, associated with healthy and unhealthy individuals, respectively. To this aim, two important aspects should be taken into account: (i) in some cases, pairs/groups of genes show collaborative attitudes, emerging in the study of disorders and dise...
In last years deep learning approaches to anomaly detection are becoming very popular. In most of the first methods the paradigm is to train neural networks initially designed for compression (Auto Encoders) or data generation (GANs) and to detect anomalies as a collateral result. Recently new architectures have been introduced in which the express...
Question Answering (QA) is a critical NLP task mainly based on deep learning models that allow users to answer questions in natural language and get a response. Since available general-purpose datasets are often not effective enough to suitably train a QA model, one of the main problems in this context is related to the availability of datasets whi...
\({{\textrm{Latent}}Out}\) is a recently introduced algorithm for unsupervised anomaly detection which enhances latent space-based neural methods, namely (Variational) Autoencoders, GANomaly and ANOGan architectures. The main idea behind it is to exploit both the latent space and the baseline score of these architectures in order to provide a refin...
The scientific impact of researchers is often evaluated based on the citations they receive from the others, thus the definition of citation metrics has long been analysed to find criteria able at jointly consider the quantity and the quality of the exchanged citations. Here, we propose a network based approach aimed at estimating the researchers i...
Given a database and one single anomalous data point, the Outlying Aspect Mining problem consists in explaining the abnormality of that data point w.r.t. the data population stored in the input database. Thus, the problem requires the discovery of the sets of attributes and associated values that account for the abnormality of a data point within a...
Tachistoscopes are devices that display a word for several seconds and ask the user to write down the word. They have been widely employed to increase recognition speed, to increase reading comprehension and, especially, to individuate reading difficulties and disabilities. Once the therapist is provided with the answers of the patients, a challeng...
Reasoning with minimal models is at the heart of many knowledge representation systems. Yet, it turns out that this task is formidable even when very simple theories are considered. It is, therefore, crucial to devise methods that attain good performances in most cases. To this end, a path to follow is to find ways to break the task at hand into se...
Anomaly detection methods exploiting autoencoders (AE) have shown good performances. Unfortunately, deep non-linear architectures are able to perform high dimensionality reduction while keeping reconstruction error low, thus worsening outlier detecting performances of AEs. To alleviate the above problem, recently some authors have proposed to explo...
In this work we deal with the problem of detecting and explaining anomalous values in categorical datasets. We take the perspective of perceiving an attribute value as anomalous if its frequency is exceptional within the overall distribution of frequencies. As a first main contribution, we provide the notion of frequency occurrence . This measure c...
Active Learning is a machine learning scenario in which methods are trained by iteratively submitting a query to a human expert and then taking into account his feedback for the following computations. The application of such paradigm to the anomaly detection task takes the name of Active Anomaly Detection (AAD). Reinforcement Learning describes a...
Among the XAI (eXplainable Artificial Intelligence) techniques, local explanations are witnessing increasing interest due to the user need to trust specific black-box decisions. In this work we explore a novel local explanation approach appliable to any kind of classifier based on generating masking models. The idea underlying the method is to lear...
In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosu...
Explaining predictions of classifiers is a fundamental problem in eXplainable Artificial Intelligence (XAI). LIME (for Local Interpretable Model-agnostic Explanations) is a recently proposed XAI technique able to explain any classifier by providing an interpretable model which approximates the black-box locally to the instance under consideration....
Datasets from different domains usually contain data defined over a wide set of attributes or features linked through correlation relationship. Moreover, there are some applications in which not all the attributes should be treated in the same fashion as some of them can be perceived like independent variables that are responsible for the definitio...
Finding outliers in networks is a central task in different application domains. Here, we exploit the stochastic block model framework to study the network from a generative point of view and design a score able to highlight those nodes whose connection with the rest of the network violates in some way the law according to which the rest of the nod...
Enabling information systems to face anomalies in the presence of uncertainty is a compelling and challenging task. In this work the problem of unsupervised outlier detection in large collections of data objects modeled by means of arbitrary multidimensional probability density functions is considered. We present a novel definition of uncertain dis...
Anomaly detection methods exploiting autoencoders (AE) have shown good performances. Unfortunately, deep non-linear architectures are able to perform high dimensionality reduction while keeping reconstruction error low, thus worsening outlier detecting performances of AEs. To alleviate the above problem, recently some authors have proposed to explo...
This work addresses the problem of helping speech therapists in interpreting results of tachistoscopes. These are instruments widely employed to diagnose speech and reading disorders. Roughly speaking, they work as follows. During a session, some strings of letters, which may or not correspond to existing words, are displayed to the patient for an...
In this work we deal with the problem of detecting and explaining exceptional behaving values in categorical datasets. As a first main contribution we provide the notion of frequency occurrence which can be thought as a form of Kernel Density Estimation applied to the domain of frequency values. As a second contribution, we define an outlierness me...
Stuttering is a widespread speech disorder involving about the of the population and the of children under the age of 5. Much work in literature studies causes, mechanisms and epidemiology and much work is devoted to illustrate treatments, prognosis and how to diagnose stutter. Relevantly, a stuttering evaluation requires the skills of a multi-dime...
Background
RNA editing is an important mechanism for gene expression in plants organelles. It alters the direct transfer of genetic information from DNA to proteins, due to the introduction of differences between RNAs and the corresponding coding DNA sequences. Software tools successful for the search of genes in other organisms not always are able...
The ADBIS conferences provide an international forum for the presentation of research on database theory, development of advanced DBMS technologies, and their applications. The 22nd edition of ADBIS, held on September 2–5, 2018, in Budapest, Hungary, includes six thematic workshops collecting contributions from various domains representing new tren...
This paper proposes a platform for achieving accountability across distributed business processes involving heterogeneous entities that need to establish various types of agreements in a standard way. The devised solution integrates blockchain and digital identity technologies in order to exploit the guarantees about the authenticity of the involve...
The enormous growth of information available in database systems has led to a significant development of techniques for knowledge discovery. At the heart of the knowledge discovery process is the application of data mining algorithms in charge of extracting hidden relationships among pieces of stored information. Information thus extracted from dat...
Tachistoscopes are devices that display a word for several seconds and ask the user to write down the word. They have been widely employed to increase recognition speed, to increase reading comprehension and, specially, to individuate reading difficulties and disabilities. Once the therapist is provided with the answers of the patients, a challengi...
In plant mitochondria an essential mechanism for gene expression is RNA editing, often influencing the synthesis of functional proteins. RNA editing alters the linearity of genetic information transfer, intro- ducing differences between RNAs and their coding DNA sequences that hind both experimental and computational research of genes. Thus common...
In plant mitochondria an essential mechanism for gene expression is RNA editing, often influencing the synthesis of functional proteins. RNA editing alters the linearity of genetic information transfer, intro- ducing differences between RNAs and their coding DNA sequences that hind both experimental and computational research of genes. Thus common...
Here we consider the problem of mining gene expression data in order to single out interesting features characterizing healthy/ unhealthy samples of an input dataset. The presented approach is based on a network model of the input gene expression data, where there is a labeled graph for each sample. This is the first attempt to build a different gr...
Biological networks rely on the storage and retrieval of data associated to the physical interactions and/or functional relationships among different actors. In particular, the attention may be on the interactions among cellular components, such as proteins, genes, RNA, or for example on phenotype–genotype associations. Data from which biological n...
When biological networks are considered, the extraction of interesting knowledge often involves subgraphs isomorphism check that is known to be NP-complete. For this reason, many approaches try to simplify the problem under consideration by considering structures simpler than graphs, such as trees or paths. Furthermore, the number of existing appro...
This chapter is devoted to a discussion on exceptional pattern discovery, namely on scenarios, contexts, and techniques concerning the mining of patterns which are so rare or so frequent to be considered as exceptional and, then, of interest for an expert to shed lights on the domain. Frequent patterns have found broad applications in areas like as...
We show that minimal models of positive propositional theories can be decomposed based on the structure of the dependency graph of the theories. This observation can be useful for many applications involving computation with minimal models. As an example of such benefits, we introduce new algorithms for minimal model finding and checking that are b...
The outlying property detection problem is the problem of discovering the
properties distinguishing a given object, known in advance to be an outlier in
a database, from the other database objects. In this paper, we analyze the
problem within a context where numerical attributes are taken into account,
which represents a relevant case left open in...
This work provides a review of biological networks as a model for analysis, presenting and discussing a number of illuminating analyses. Biological networks are an effective model for providing insights about biological mechanisms. Networks with different characteristics are employed for representing different scenarios. This powerful model allows...
We consider the problem of mining gene expression data in order to single out interesting features that characterize healthy/unhealthy samples of an input dataset. We present and approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attem...
We present a technique for node anomaly detection in networks where arcs are annotated with time of creation. The technique aims at singling out anomalies by taking simultaneously into account information concerning both the structure of the network and the order in which connections have been established. The latter information is obtained by time...
We consider the problem of mining gene expression data in order to single out interesting features characterizing healthy/unhealthy samples of an input dataset. We present an approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt t...
In this work, we introduce a novel definition of outlier, namely the Gradient Outlier Factor (or GOF), with the aim to provide a definition that unifies with the statistical one on some standard distributions but has a different behavior in the presence of mixture distributions. Intuitively, the GOF score measures the probability to stay in the nei...
In plant mitochondria an essential mechanism for gene expression is RNA editing, often influencing the synthesis
of functional proteins. RNA editing alters the linearity of genetic information transfer. Indeed it causes
differences between RNAs and their coding DNA sequences that hinder both experimental and computational
research of genes. Therefo...
We present a novel definition of outlier whose aim is to embed an available domain knowledge in the process of discovering outliers. Specifically, given a background knowledge, encoded by means of a set of first-order rules, and a set of positive and negative examples, our approach aims at singling out the examples showing abnormal behavior. The te...
We consider the problem of unsupervised outlier detection in large collections of data objects when objects are modeled by means of arbitrary multidimensional probability density functions. Specifically, we present a novel definition of outlier in the context of uncertain data under the attribute level uncertainty model, according to which an uncer...
Designing algorithms capable of efficiently constructing minimal models of
CNFs is an important task in AI. This paper provides new results along this
research line and presents new algorithms for performing minimal model finding
and checking over positive propositional CNFs and model minimization over
propositional CNFs. An algorithmic schema, cal...
Determining a good sets of pivots is a challenging task for metric space indexing. Several techniques to select pivots from the data to be indexed have been introduced in the literature. In this paper, we propose a pivot placement strategy which exploits the natural data orientation in order to select space points which achieve a good alignment wit...
We consider the problem of discovering attributes, or properties, accounting for the a priori stated abnormality of a group of anomalous individuals (the outliers) with respect to an overall given population (the inliers). To this aim, we introduce the notion of exceptional property and define the concept of exceptionality score, which measures the...
This work deals with the problem of classifying uncertain data. With this aim we introduce the Uncertain Nearest Neighbor (UNN) rule, which represents the generalization of the deterministic nearest neighbor rule to the case in which uncertain objects are available. The UNN rule relies on the concept of nearest neighbor class, rather than on that o...
In this study, we deal with the problem of efficiently answering range queries over uncertain objects in a general metric space. In this study, an uncertain object is an object that always exists but its actual value is uncertain and modeled by a multivariate probability density function. As a major contribution, this is the first work providing an...
We present L-SME, a system to efficiently identify loosely structured motifs in genome-wide applications. L-SME is innovative
in three aspects. Firstly, it handles wider classes of motifs than earlier motif discovery systems, by supporting boxes swaps
and skips in the motifs structure as well as various kinds of similarity functions. Secondly, in a...
This work deals with the problem of classifying uncertain data. With this aim
the Uncertain Nearest Neighbor (UNN) rule is here introduced, which represents
the generalization of the deterministic nearest neighbor rule to the case in
which uncertain objects are available. The UNN rule relies on the concept of
nearest neighbor class, rather than on...
Plants have played a special role in inositol polyphosphate (IP) research since in plant seeds was discovered the first IP, the fully phosphorylated inositol ring of phytic acid (IP6). It is now known that phytic acid is further metabolized by the IP6 Kinases (IP6Ks) to generate IP containing pyro-phosphate moiety. The IP6K are evolutionary conserv...
Accession numbers of the genes referred in the figures.
IP6 Kinases (IP6Ks) are important mammalian enzymes involved in inositol phosphates metabolism. Although IP6Ks have not yet
been identified in plant chromosomes, there are many clues suggesting that the corresponding gene might be found in plant
mtDNA, encrypted and hidden by virtue of editing and/or trans-splicing processes. In this paper, we prop...
A new technique, SNIPER, is proposed for learning a model that deals with continuous values of exceptionality. Specifically, given some training objects associated with a continuous attribute F, SNIPER induces a rule-based model for the identification of those objects likely to score the maximum values for F. The purpose of SNIPER differs from the...
This work proposes a method for detecting distance-based outliers in data streams under the sliding window model. The novel
notion of one-time outlier query is introduced in order to detect anomalies in the current window at arbitrary points-in-time.
Three algorithms are presented. The first algorithm exactly answers to outlier queries, but has lar...
Assume a population partitioned in two subpopulations, e.g. a set of normal individuals and a set of abnormal individuals, is given. Assume, moreover, that we look for a characterization of the reasons discriminating one subpopulation from the other. In this paper, we provide a technique by which such an evidence can be mined, by introducing the no...
Head-elementary-set-free (HEF) programs were proposed in (Gebser et al. 2007) and shown to generalize over head-cycle-free programs while retaining their nice properties. It was left as an open problem in (Gebser et al. 2007) to establish the complexity of identifying HEF programs. This note solves the open problem by showing that the problem is co...
In this paper we describe an experience resulting from the collaboration among data mining researchers, domain experts of the Italian revenue agency, and IT professionals, aimed at detecting fraudulent VAT credit claims. The outcome is an auditing methodology based on a rule-based system, which is capable of trading among conflicting issues, such a...
We present a novel definition of outlier in the context of inductive logic programming. Given a set of positive and negative examples, the definition aims at singling out the examples showing anomalous behavior. We note that the task here pursued is different from noise removal, and, in fact, the anomalous observations we discover are different in...
Assume you are given a data population characterized by a certain number of attributes. Assume, moreover, you are provided with the information that one of the individuals in this data population is abnormal, but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal. In several cases, you will be ind...
In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented.
It is both theoretically and empirically shown that the main memory usage of DOLPHIN amounts to a small fraction of the datas...
In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented. It is both theoretically and empirically shown that the main memory usage of DOLPHIN amounts to a small fraction of the datas...
The discovery of information encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually encoded in patterns frequently occurring in the sequences, also called motifs. In fact, motif discovery has received much attention in the literature, and sev...
In this work we propose an unsupervised data cleaning method whose goal is to single out possibly erroneous entries in a database with textual fields. The method is particularly useful when no domain information is available about the correctness of the individual entries. With this aim, an unsupervised outlier detection like technique is proposed...
ABSTRACT In this work a method,for detecting distance-based outliers in data streams is presented. We deal with the sliding win- dow model, where outlier queries are performed in order to detect anomalies in the current window. Two algorithms are presented. The flrst one exactly answers outlier queries, but has larger space requirements. The second...
In this work a novel algorithm, named DOLPHIN, for detecting distance-based outliers is presented. The proposed algorithm performs only two sequential scans of the dataset. It needs to store into main memory a portion of the dataset, to efficiently search for neighbors and early prune inliers. The strategy pursued by the algorithm allows to keep th...
Functional dependencies (FDs) are an integral part of relational database theory since they are used in integrity enforcement and in database design. Despite their importance FDs are often not specified or some of them are not expected by database designers, but they occur in the data and the need of inferring them from data arises. Furthermore, in...
Functional dependencies (FDs) are an integral part of database theory since they are used in integrity enforcement and in database de- sign. Recently, functional dependencies satisfled by XML data (XFDs) have been introduced. In this work approximate functional dependen- cies that are XFDs approximately satisfled by a considerable part of the XML d...
In the last few years, the completion of the human genome sequencing showed up a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This informat...