Michelangelo Ceci

Michelangelo Ceci
University of Bari Aldo Moro | Università di Bari · Department of Computer Science

PhD

About

254
Publications
44,602
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,155
Citations
Additional affiliations
January 2005 - October 2015
University of Bari Aldo Moro
Position
  • Professor (Assistant)

Publications

Publications (254)
Article
Full-text available
Cryptocurrencies are virtual currencies that exploit cryptography to perform secure financial transactions. They gained widespread popularity in recent years due to their decentralized nature, (pseudo-)anonymity, and ability to facilitate cross-border transactions without the need for intermediaries. However, their price on the market exhibits a hu...
Article
Full-text available
Dynamic networks are ubiquitous in many domains for modelling evolving graph-structured data and detecting changes allows us to understand the dynamic of the domain represented. A category of computational solutions is represented by the pattern-based change detectors (PBCDs), which are non-parametric unsupervised change detection methods based on...
Article
Full-text available
Forecasting methods are important decision support tools in geo-distributed sensor networks. However, challenges such as the multivariate nature of data, the existence of multiple nodes, and the presence of spatio-temporal autocorrelation increase the complexity of the task. Existing forecasting methods are unable to address these challenges in a c...
Article
Full-text available
Background Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each diff...
Article
Full-text available
Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received much attention from the research community, this is not the case for complex prediction tasks with structurally dependent variables, such...
Article
Existing data engine implementations do not properly manage the conflict between the need of protecting and sharing data, which is hampering the spread of big data applications and limiting their impact. These two requirements have often been studied and defined independently, leading to a conceptual and technological misalignment. This article pre...
Article
Full-text available
The identification of anomalous activities is a challenging and crucially important task in sensor networks. This task is becoming increasingly complex with the increasing volume of data generated in real-world domains, and greatly benefits from the use of predictive models to identify anomalies in real time. A key use case for this task is the ide...
Article
Full-text available
The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationship...
Chapter
Explainable AI (XAI) focuses on designing inference explanation methods and tools to complement machine learning and black-box deep learning models. Such capabilities are crucially important with the rising adoption of AI models in real-world applications, which require domain experts to understand how model predictions are extracted in order to ma...
Article
Full-text available
The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed...
Article
Full-text available
The massive adoption of social networks increased the need to analyze users’ data and interactions to detect and block the spread of propaganda and harassment behaviors, as well as to prevent actions influencing people towards illegal or immoral activities. In this paper, we propose HURI, a method for social network analysis that accurately classif...
Chapter
Anomaly detection is a machine learning task that has been investigated within diverse research areas and application domains. In this paper, we performed anomaly detection for Physical Threat Intelligence. Specifically, we performed anomaly detection for air pollution and public transport traffic analysis for the city of Oslo, Norway. To this aim,...
Article
The massive spread of social networks provided a plethora of new possibilities to communicate and interact worldwide. On the other hand, they introduced some negative phenomena related to social media addictions, as well as additional tools for cyberbullying and cyberterrorism activities. Therefore, monitoring operations on the posted contents and...
Chapter
Smart grids are networks that distribute electricity by relying on advanced communication technologies, sensor measurements, and predictive methods, to quickly adapt the network behavior to different possible scenarios. In this context, the adoption of machine learning approaches to forecast the customer energy consumption is essential to optimize...
Preprint
Full-text available
Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally...
Article
Full-text available
As the complexity of data increases, so does the importance of powerful representations, such as relational and logical representations, as well as the need for machine learning methods that can learn predictive models in such representations. A characteristic of these representations is that they give rise to a huge number of features to be consid...
Article
In many real-world domains, data can naturally be represented as networks. This is the case of social networks, bibliographic networks, sensor networks and biological networks. Some dynamism often characterizes these networks as their structure (i.e., nodes and edges) continually evolves. Considering this dynamism is essential for analyzing these n...
Article
Full-text available
Matrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also known as biclustering. Applications for biclustering encompass the clustering of high-dimensional data and explorative data mining, where the selection of the most important features is rele...
Article
Full-text available
Social Media have enabled users to keep inter-personal relationships, but also to voice personal sensations, emotions and feelings. The recent literature reports on the potential of technologies based on emotion detection and analysis. However, the understanding of user generated emotional content is a challenging task because it requires the extra...
Chapter
The huge amount of data generated by sensor networks enables many potential analyses. However, one important limiting factor for the analyses of sensor data is the possible presence of anomalies, which may affect the validity of any conclusion we could draw. This aspect motivates the adoption of a preliminary anomaly detection method. Existing meth...
Article
Motivation: Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organisation across cell types, as well as to elucidating pathogenic processes...
Article
Full-text available
In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sect...
Article
Full-text available
A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stag...
Article
Full-text available
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negati...
Article
Full-text available
We address the task of learning ensembles of predictive models for structured output prediction (SOP). We focus on three SOP tasks: multi-target regression (MTR), multi-label classification (MLC) and hierarchical multi-label classification (HMC). In contrast to standard classification and regression, where the output is a single (discrete or contin...
Chapter
Traditional process mining approaches learn process models assuming that processes are in steady-state. This does not comply with the flexibility and adaptation often requested for information systems and business models. In fact, these approaches should discover variations to adapt to new circumstances, which is a peculiarity that conventional cha...
Article
Full-text available
The increasing presence of renewable energy plants has created new challenges such as grid integration, load balancing and energy trading, making it fundamental to provide effective prediction models. Recent approaches in the literature have shown that exploiting spatio-temporal autocorrelation in data coming from multiple plants can lead to better...
Article
Change mining is one of the main subjects of analysis on time-evolving data. Regardless of the distribution of the changes over the data, often the algorithms return very large sets of results. In fact, one class of algorithms designed for change mining is based on pattern mining, which notoriously suffers from the problem of a huge number of retur...
Chapter
With data becoming more and more complex, the standard tabular data format often does not suffice to represent datasets. Richer representations, such as relational ones, are needed. However, a relational representation opens a much larger space of possible descriptors (features) of the examples that are to be classified. Consequently, it is importa...
Article
Full-text available
Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In thi...
Article
Full-text available
Gene network reconstruction is a bioinformatics task that aims at modelling the complex regulatory activities that may occur among genes. This task is typically solved by means of link prediction methods that analyze gene expression data. However, the reconstructed networks often suffer from a high amount of false positive edges, which are actually...
Article
Despite the ease of collecting abundance of data about various phenomena, obtaining labeled data needed for learning models with high predictive performance remains a difficult and expensive task in many domains. This issue is particularly present in the case of the analysis of scientific data where obtaining labeled data typically requires expensi...
Article
Full-text available
Gravitational waves represent a new opportunity to study and interpret phenomena from the universe. In order to efficiently detect and analyze them, advanced and automatic signal processing and machine learning techniques could help to support standard tools and techniques. Another challenge relates to the large volume of data collected by the dete...
Article
Full-text available
Background. The study of functional associations between ncRNAs and human diseases is a pivotal task of modern research to develop new and more effective therapeutic approaches. Nevertheless, it is not a trivial task since it involves entities of different types, such as microRNAs, lncRNAs or target genes whose expression also depends on endogenous...
Chapter
Full-text available
Communication networks are inherently dynamic and the changes are often due to unpredicted causes, for instance, failures of the devices or bulks of user requests. To guarantee the continuation of the services, the providers should keep the typical activities of control and management of the network aligned with respect to these changes. They shoul...
Book
This book discusses the challenges facing current research in knowledge discovery and data mining posed by the huge volumes of complex data now gathered in various real-world applications (e.g., business process monitoring, cybersecurity, medicine, language processing, and remote sensing). The book consists of 14 chapters covering the latest resear...
Book
This book constitutes the thoroughly refereed proceedings of the 16th Italian Research Conference on Digital Libraries, IRCDL 2020, held in Bari, Italy, in January 2020. The 12 full papers and 6 short papers presented were carefully selected from 26 submissions. The papers are organized in topical sections on information retrieval, bid data and dat...
Book
This book constitutes the refereed post-conference proceedings of the 8th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2019, held in conjunction with ECML-PKDD 2019 in Würzburg, Germany, in September 2019. The workshop focused on the latest developments in the analysis of complex and massive data sources, such as blogs,...
Chapter
Next activity prediction is one of the most important problems concerning the operational monitoring of processes, that is, supporting the user in predicting the activity that will be executed as the next step during process execution. However, traditional algorithms do not cope with the presence of parallel activities, thus failing to devise accur...
Article
Pattern-based change detection (PBCD) describes a class of change detection algorithms for evolving data. Contrary to conventional solutions, PBCD seeks changes exhibited by the patterns over time and therefore works on an abstract form of the data, which prevents the search for changes on the raw data. Moreover, PBCD provides arguments on the vali...
Chapter
Pattern-based change detectors (PBCDs) are non-parametric unsupervised change detection methods that are based on observed changes in sets of frequent patterns over time. In this paper we study PBCDs for dynamic networks; that is, graphs that change over time, represented as a stream of snapshots. Accurate PBCDs rely on exhaustively mining sets of...
Article
Motivation: The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data has received increasing attention in recent years, due to its usefulness in the understanding of regulatory mechanisms involved in human diseases. Most of the existing methods reconstruct the network through machine learning approaches, by analyzing known e...
Article
Full-text available
Recent developments in sensor networks and mobile computing led to a huge increase in data generated that need to be processed and analyzed efficiently. In this context, many distributed data mining algorithms have recently been proposed. Following this line of research, we propose the DENCAST system, a novel distributed algorithm implemented in Ap...
Article
Full-text available
In renewable energy forecasting, data are typically collected by geographically distributed sensor networks, which poses several issues. (i) Data represent physical properties that are subject to concept drift, i.e., their characteristics could change over time. To address the concept drift phenomenon, adaptive online learning methods should be con...
Article
Full-text available
The increasing presence of geo-distributed sensor networks implies the generation of huge volumes of data from multiple geographical locations at an increasing rate. This raises important issues which become more challenging when the final goal is that of the analysis of the data for forecasting purposes or, more generally, for predictive tasks. Th...
Article
Full-text available
The Growing Hierarchical Self-Organizing Map (GHSOM) algorithm has shown its potential for performing several tasks such as exploratory analysis, anomaly detection and forecasting on a variety of domains including the financial and cyber-security domains. GHSOM is a dynamic variant of the SOM algorithm which generates a multi-level hierarchy of SOM...
Conference Paper
Human emotion analysis has always stimulated studies in different disciplines, such as Cognitive Sciences, Psychology, and thanks to the diffusion of the social media, it is attracting the interests of computer scientists too. Particularly, the growing popularity of Microblogging platforms, has generated large amounts of information which in turn r...
Article
Network data streams are unbounded sequences of complex data produced at high rate which represent complex systems that evolve continuously over time. In this scenario, a problem worthy of being studied is the analysis of the changes, which may concern a complex system as a whole or small parts of it. In this paper, these are distinguished into mac...
Conference Paper
Process mining is a research discipline that aims to discover, monitor and improve real processing using event logs. In this paper we tackle the problem of next activity prediction/recommendation via "nested prediction model" learning, that is, we first identify recurrent and frequent sequences of activities and then we learn a prediction model for...
Chapter
The growing trend of Big Data drives additional demand for novel solutions and specifically-designed algorithms that will perform efficient Big Data filtering and processing, recently even in a real-time fashion. Thus, the necessity to scale up Machine Learning algorithms to larger datasets and more complex methods should be addressed by distribute...
Chapter
The aim of this article is to synthetically describe a sample of distinct approaches and applications of Relational Data Mining, which address the issue of managing complex, and possibly big, amounts of data. Specifically, we report a brief review of the literature on Relational Data Mining in the fields of Spatial Data Mining, Process Mining, Netw...
Article
Full-text available
Heterogeneous networks are networks consisting of different types of objects and links. They can be found in several fields, ranging from the Internet to social sciences, biology, epidemiology, geography and finance. Several methods have already been proposed for the analysis of network data, but they usually focus on homogeneous networks, where ob...
Article
The predictive performance of traditional supervised methods heavily depends on the amount of labeled data. However, obtaining labels is a difficult process in many real-life tasks, and only a small amount of labeled data is typically available for model learning. As an answer to this problem, the concept of semi-supervised learning has emerged. Se...
Conference Paper
Sequence mining is one of the most investigated tasks in data mining and it has been studied under several perspectives. With the rise of Big Data technologies, the perspective of efficiency becomes prominent especially when mining massive sequences. In this paper, we perform a thorough experimental evaluation of several algorithms for sequential p...
Article
Full-text available
Heterogeneous information networks consist of different types of objects and links. They can be found in several social, economic and scientific fields, ranging from the Internet to social sciences, including biology, epidemiology, geography, finance and many others. In the literature, several clustering and classification algorithms have been prop...
Book
This book constitutes the proceedings of the 21st International Conference on Discovery Science, DS 2018, held in Limassol, Cyprus, in October 2018, co-located with the International Symposium on Methodologies for Intelligent Systems, ISMIS 2018. The 30 full papers presented together with 5 abstracts of invited talks in this volume were carefully r...