About
88
Publications
7,352
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
637
Citations
Introduction
Additional affiliations
February 2009 - July 2016
Publications
Publications (88)
Quantum machine learning recently gained prominence due to the promise of quantum computers in solving machine learning problems that are intractable on a classical computer. Nevertheless, several studies on problems which remain challenging for classical computing algorithms are emerging. One of these is classifying continuously incoming data inst...
Crowdfunding has evolved into a formidable mechanism for collective financing, challenging traditional funding sources such as bank loans, venture capital, and private equity with its global reach and versatile applications across various sectors. This paper explores the complex dynamics of crowdfunding platforms, particularly focusing on investor...
Dynamic networks are ubiquitous in many domains for modelling evolving graph-structured data and detecting changes allows us to understand the dynamic of the domain represented. A category of computational solutions is represented by the pattern-based change detectors (PBCDs), which are non-parametric unsupervised change detection methods based on...
Crowdfunding has become a popular way for entrepreneurs and companies to raise capital. However, increasing competition in the crowdfunding market has led to the need for predictive models to estimate the success or failure of crowdfunding campaigns. This study presents preliminary results on predicting investor behavior and investment patterns in...
Social Media have enabled users to keep inter-personal relationships, but also to voice personal sensations, emotions and feelings. The recent literature reports on the potential of technologies based on emotion detection and analysis. However, the understanding of user generated emotional content is a challenging task because it requires the extra...
Deep neural network architectures have recently achieved state-of-the-art results learning flexible and effective intrusion detection models. Since attackers constantly use new attack vectors to avoid being detected, concept drift commonly occurs in the network traffic by degrading the effect of the detection model over time also when deep neural n...
Traditional process mining approaches learn process models assuming that processes are in steady-state. This does not comply with the flexibility and adaptation often requested for information systems and business models. In fact, these approaches should discover variations to adapt to new circumstances, which is a peculiarity that conventional cha...
Change mining is one of the main subjects of analysis on time-evolving data. Regardless of the distribution of the changes over the data, often the algorithms return very large sets of results. In fact, one class of algorithms designed for change mining is based on pattern mining, which notoriously suffers from the problem of a huge number of retur...
The advances in Internet-of-things (IoT) have fostered the development of new technologies to sense and monitor the urban scenarios. Specifically, Mobile Crowd Sensing (MCS) represents one of the suitable solutions because it easily enables the integration of smartphones collecting massive ubiquitous data at relatively low cost. However, MCS can be...
Networks had an increasing impact on modern life since network cybersecurity has become an important research field. Several machine learning techniques have been developed to build network intrusion detection systems for correctly detecting unforeseen cyber-attacks at the network-level. For example, deep artificial neural network architectures hav...
Communication networks are inherently dynamic and the changes are often due to unpredicted causes, for instance, failures of the devices or bulks of user requests. To guarantee the continuation of the services, the providers should keep the typical activities of control and management of the network aligned with respect to these changes. They shoul...
This book discusses the challenges facing current research in knowledge discovery and data mining posed by the huge volumes of complex data now gathered in various real-world applications (e.g., business process monitoring, cybersecurity, medicine, language processing, and remote sensing). The book consists of 14 chapters covering the latest resear...
This book constitutes the refereed post-conference proceedings of the 8th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2019, held in conjunction with ECML-PKDD 2019 in Würzburg, Germany, in September 2019.
The workshop focused on the latest developments in the analysis of complex and massive data sources, such as blogs,...
Pattern-based change detection (PBCD) describes a class of change detection algorithms for evolving data. Contrary to conventional solutions, PBCD seeks changes exhibited by the patterns over time and therefore works on an abstract form of the data, which prevents the search for changes on the raw data. Moreover, PBCD provides arguments on the vali...
Social media allow users convey emotions, which are often related to real-world events, social relationships or personal experiences. Indeed, emotions can determine the propension of the users to socialize or attend events. Similarly, interactions with people can influence the personality and feelings of the individuals. Therefore, studying emotion...
Urban pollution is usually monitored via fixed stations that provide detailed and reliable information, thanks to equipment quality and effective measuring protocols, but these sampled data are gathered from very limited areas and through discontinuous monitoring campaigns. Currently, the spread of mobile devices has fostered the development of new...
Intrusion Detection Systems aim to address the problem of correctly identifying unforeseen network attacks. The attack detection problem has been already tackled through supervised and unsupervised machine learning approaches. While the former methods lead to models very accurate on already seen samples, the latter provide models robust on unforese...
The technologies of communication, such as forums and instant messaging, available in the social media platforms open to the possibility to convey and express emotions and feelings, besides to facilitate interaction. Emotions and social relationships are often connected, indeed, emotions and feelings can make the users favorable or reluctant to soc...
Human emotion analysis has always stimulated studies in different disciplines, such as Cognitive Sciences, Psychology, and thanks to the diffusion of the social media, it is attracting the interests of computer scientists too. Particularly, the growing popularity of Microblogging platforms, has generated large amounts of information which in turn r...
Active learning is a promising machine learning paradigm for querying oracles and obtaining actual labels for particular examples. Its goal is to decrease the number of labels needed, in order to learn a predictive model able to achieve a high level of accuracy. It may turn out to be advantageous in several regression problems where scarce labels c...
Network data streams are unbounded sequences of complex data produced at high rate which represent complex systems that evolve continuously over time. In this scenario, a problem worthy of being studied is the analysis of the changes, which may concern a complex system as a whole or small parts of it. In this paper, these are distinguished into mac...
Sequence mining is one of the most investigated tasks in data mining and it has been studied under several perspectives. With the rise of Big Data technologies, the perspective of efficiency becomes prominent especially when mining massive sequences. In this paper, we perform a thorough experimental evaluation of several algorithms for sequential p...
Advances in tracking technology enable the gathering of spatio-temporal data in the form of trajectories, which when analysed can convey useful knowledge. In particular, discovering groups of moving objects is a valuable means for a wide class of problems related to mobility. The task of group mining has been investigated by considering mostly the...
Recently, evolving networks are becoming a suitable form to model many real-world complex systems, due to their peculiarities to represent the systems and their constituting entities, the interactions between the entities and the time-variability of their structure and properties. Designing computational models able to analyze evolving networks bec...
The climate changes have attracted always interest because they may have great impact on the life on Earth and living beings. Computational solutions may be useful both for the prediction of the climate changes and for their characterization, perhaps in association with other phenomena. Due to the cyclic and seasonal nature of many climate processe...
Temporal data describe processes and phenomena that evolve over time. In many real-world applications temporal data are characterized by temporal autocorrelation, which expresses the dependence of time-stamped data over a certain a time lag. Often such processes and phenomena are characterized by evolving complex entities, which we can represent wi...
This book features a collection of revised and significantly extended versions of the papers accepted for presentation at the 5th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2016, held in conjunction with ECML-PKDD 2016 in Riva del Garda, Italy, in September 2016. The book is composed of five parts: feature selection a...
The empowerment of the information technologies in many real-world applications has opened to the possibility of tracking complex and evolving phenomena and gather information able to describe such phenomena. For instance, in bio-medical applications, we can monitor a patient and collect data that range from his clinical picture to the laboratory s...
This book constitutes the thoroughly refereed post-conference proceedings of the 4th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2015, held in conjunction with ECML-PKDD 2015 in Porto, Portugal, in September 2015.
The 15 revised full papers presented together with one invited talk were carefully reviewed and selected f...
The phenotype is the result of a genotype expression in a given environment. Genetic and eventually protein mutations and/or environmental changes may affect the biological homeostasis leading to a pathological status of a normal phenotype. Studying the alterations of the phenotypes on a temporal basis becomes thus relevant and even determinant whe...
Sensor networks, communication and financial networks, web and social networks are becoming increasingly important in our day-to-day life. They contain entities which may interact with one another. These interactions are often characterized by a form of autocorrelation, where the value of an attribute at a given entity depends on the values at the...
Document summarization involves reducing a text document into a short set of phrases or sentences that convey the main meaning of the text. In digital libraries, summaries can be used as concise descriptions which the user can read for a rapid comprehension of the retrieved documents. Most of the existing approaches rely on the classification algor...
Networks are data structures more and more frequently used for modeling interactions in social and biological phenomena, as well as between various types of devices, tools and machines. They can be either static or dynamic, dependently on whether the modeled interactions are fixed or changeable over time. Static networks have been extensively inves...
The detection of congested areas can play an important role in the development of systems of traffic management. Usually, the problem is investigated under two main perspectives which concern the representation of space and the shape of the dense regions respectively. However, the adoption of movement tracking technologies enables the generation of...
In predictive data mining tasks, we should account for autocorrelations of both the independent variables and the dependent variable, which we can observe in neighborhood of a target node and that same node. The prediction on a target node should be based on the value of the neighbours which might even be unavailable. To address this problem, the v...
The recent developments in technologies and life sciences have paved the way to complex interactions among entities in distributed and heterogeneous environments. As a result, an enormous amount of valuable information is available, spanning from structured to multimedia and spatial or spatio-temporal data. The data mining research community has be...
Recent advances on tracking technologies enable the collection of spatio-temporal data in the form of trajectories. The analysis of such data can convey knowledge in prominent applications, and mining groups of moving objects turns out to be a valuable mean to model their movement. Existing approaches pay particular attention in groups where object...
In predictive data mining tasks, we should account for auto- correlations of both the independent variables and the dependent vari-able, which we can observe in neighborhood of a target node and that same node. The prediction on a target node should be based on the value of the neighbours which might even be unavailable. To address this problem, th...
This book constitutes the thoroughly refereed post-conference proceedings of the Second International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2013, held in conjunction with ECML/PKDD 2013 in Prague, Czech Republic, in September 2013. The 16 revised full papers were carefully reviewed and selected from numerous submissions. The p...
Linking biomedical concepts is one of the task of the literature-based discovery and permits to identify interesting and hidden relations between seemingly unconnected concepts or entities. Most of existing approaches rely on the assumption that data and underlying literature are static or considered as unchangeable domains. While scientific litera...
Background
microRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of interactions between different miRNAs and their target genes is necessary for the understanding of miRNAs' role in the control of cell life and death. In this paper we propose a novel data mini...
MicroRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of the interactions between miRNAs and their target messenger RNAs (mRNAs) can contribute to the understanding of miRNAs' role in the control of cell life and death. In this paper we present a novel bicluste...
In Document Image Understanding, one of the fundamental tasks is that of recognizing semantically relevant components in the layout extracted from a document image. This process can be automatized by learning classifiers able to automatically label such components. However, the learning process assumes the availability of a huge set of documents wh...
This book constitutes the thoroughly refereed conference proceedings of the First International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2012, held in conjunction with ECML/PKDD 2012, in Bristol, UK, in September 2012.
The 15 revised full papers were carefully reviewed and selected from numerous submissions. The papers are organi...
This paper faces the problem of harvesting geographic information from Web documents, specifically, extracting facts on spatial relations among geographic places. The motivation is twofold. First, researchers on Spatial Data Mining often assume that spatial data are already available, thanks to current GIS and positioning technologies. Nevertheless...
Most of the works on learning from networked data assume that the network is static. In this paper we consider a different scenario, where the network is dynamic, i.e. nodes/relationships can be added or removed and relationships can change in their type over time. We assume that the "core" of the network is more stable than the "marginal" part of...
In this paper, we face the problem of extracting spatial relationships from geographical entities mentioned in textual documents. This is part of a research project which aims at geo-referencing document contents, hence making the realization of a Geographical Information Retrieval system possible. The driving factor of this research is the huge am...
Motivations microRNAs (miRNAs) are post-transcriptional regulators which represent one of the major regulatory gene families in animals, plants and viruses and that plays a key role in almost all main cellular processes. The computational prediction of miRNA target genes is important for the functional annotation of genomes and, on the other side,...
microRNAs (miRNAs) are an important class of regulatory factors controlling gene expressions at post-transcriptional level. Studies on interactions between different miRNAs and their target genes are of utmost importance to understand the role of miRNAs in the control of biological processes. This paper contributes to these studies by proposing a m...
A fundamental task of document image understanding is to recognize semantically relevant components in the layout extracted
from a document image. This task can be automatized by learning classifiers to label such components. The application of inductive
learning algorithms assumes the availability of a large set of documents, whose layout componen...
Bisociations represent interesting relationships between seemingly unconnected concepts from two or more contexts. Most of the existing approaches that permit the discovery of bisociations from data rely on the assumption that contexts are static or considered as unchangeable domains. Actually, several real-world domains are intrinsically dynamic a...
Longitudinal data consist of the repeated measurements of some variables which describe a process (or phenomenon) over time. They can be analyzed to unearth information on the dynamics of the process. In this paper we propose a temporal data mining framework to analyze these data and acquire knowledge, in the form of temporal patterns, on the event...
The automatic discovery of process models can help to gain insight into various perspectives (e.g., control flow or data perspective) of the process executions traced in an event log. Frequent patterns mining offers a means to build human understandable representations of these process models. This paper describes the application of a multi-relatio...
Technologies in available biomedical repositories do not yet provide adequate mechanisms to support the understanding and analysis of the stored content. In this project we investigate this problem under different perspectives. Our contribution is the design of computational solutions for the analysis of biomedical documents and images. These integ...
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. In this pro...
The automatic discovery of process models can help to gain insight into various perspectives (e.g., control flow or data perspective) of the process executions traced in an event log. Frequent patterns min- ing offers a means to build human understandable representations of these process models. This paper describes the application of a multi- rela...
Motif discovery in biological sequences is an important field in bioinformatics. Most of the scientific research focuses on
the de novo discovery of single motifs, but biological activities are typically co-regulated by several factors and this feature
is properly reflected by higher order structures, called composite motifs, or cis-regulatory modu...
Traditional pattern discovery approaches permit to identify frequent patterns expressed in form of conjunctions of items and
represent their frequent co-occurrences. Although such approaches have been proved to be effective in descriptive knowledge
discovery tasks, they can miss interesting combinations of items which do not necessarily occur toget...
A key task in data mining and information retrieval is learning preference relations. Most of methods reported in the literature learn preference relations between objects which are represented by attribute-value pairs or feature vectors (propositional representation). The growing interest in data mining techniques which are able to directly deal w...
Analyzing physiological data can be of great importance in unearthing information on the course of a disease. In this paper we propose a data mining approach to analyze these data and acquire knowledge, in the form of temporal patterns, on the physiological events which can frequently trigger particular stages of disease. The application to the sle...
The discovery of new and potentially meaningful relationships between named entities in biomedical literature can take great advantage from the application of multirelational data mining approaches in text mining. This is motivated by the peculiarity of multi-relational data mining to be able to express and manipulate relationships between entities...
We resort to preference learning in order to address the prob-lem of acquiring necessary knowledge in two distinct steps of the doc-ument image analysis process: 1) reading order detection, and 2) docu-ment summarization. We advocate a relational approach for both cases and we propose a probabilistic relational learning method. Experiments on real...
In spatial domains, objects present high heterogeneity and are connected by several relationships to form complex networks. Mining spatial networks can provide information on both the objects and their interactions. In this work we propose a descriptive data mining approach to discover relational disjunctive patterns in spatial networks. Relational...
Novelty detection in data stream mining denotes the identifi- cation of new or unknown situations in a stream of data elements flowing continuously in at rapid rate. This work is a first attempt of investigat- ing the anomaly detection task in the (multi-)relational data mining. By defining a data block as the collection of complex data which perio...
Clinical Practice Guidelines guide decision making in decision problems such as the diagnosis, prevention, etc. for specific
clinical circumstances. They are usually available in the form of textual documents written in natural language whose interpretation,
however, can make difficult their implementation. Additionally, the high number of availabl...
We face the problem of novelty detection from stream data, that is, the identification of new or unknown situations in an
ordered sequence of objects which arrive on-line, at consecutive time points. We extend previous solutions by considering
the case of objects modeled by multiple database relations. Frequent relational patterns are efficiently e...
Association rules (AR) are a class of patterns which describe regularities in a set of transactions. When items of transactions
are organized in a taxonomy, AR can be associated with a level of the taxonomy since they contain only items at that level.
A drawback of multiple level AR mining is represented by the generation of redundant rules which d...
Physiological data represent the health conditions of a patient over time. They can be analyzed to gain knowledge on the course of a disease or, more generally, on the physiology of a patient. Typical approaches rely on background medical knowledge to track or recognize single stages of the disease. However, when no one domain knowledge is availabl...
Results of the first mining step on regulatory RNA motifs in UTRminer. Data reported in the table gives a general overview of results of the first mining step runs. The percentage of each INIT detected in respect of the total sample is shown. The order of RNA target sites in table INIT rows, is neither indicative of the order of the target sites al...
Many studies report about detection and functional characterization of cis-regulatory motifs in untranslated regions (UTRs) of mRNAs but little is known about the nature and functional role of their distribution. To address this issue we have developed a computational approach based on the use of data mining techniques. The idea is that of mining f...
Longitudinal data consist of the repeated measurements of some variables which describe the dynamics of a domain(process or phenomenon) over time. They can be analyzed in order to explain what event may cause the transition from a state into the next one during the evolution of the domain. Generally, approaches to this explanation problem rely on t...
A data stream is a sequence of time-stamped data elements which arrive on-line, at consecutive time points. In this work we propose a multi-relational approach to mine complex data streams in order to identify novelty patterns which target new or unknown situations in the stream. Multi-relational data mining is motivated by the existence of several...
The inference of Explanations is a problem typically studied in the field of Temporal Reasoning by means of approaches related to the reasoning about action and change, which aim usually to infer statements that explain a given change. Most of proposed works are based on inferential logic mechanisms that assume the existence of a general domain kno...
Generalized association rules are a very important extension of traditional association rules which allows to exploit taxonomical knowledge defined over items to be mined. However, by using a taxonomy several thousands of rules are discovered and the most of them can be redundant. In this paper, we propose a solution to the problem of mining non re...
The problem of time-series segmentation has been widely discussed and it has been successfully applied in a variety of areas including computational genomics, telecommunications and process monitoring. Nevertheless not many techniques have been devised to deal with multidimensional evolving data describing complex objects. Moreover, in many applica...