About
65
Publications
16,293
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
908
Citations
Introduction
Gianvito Pio currently works at the Department of Computer Science, Università degli Studi di Bari Aldo Moro. Gianvito does research in Data Mining, Artificial Intelligence, Big Data and Bioinformatics.
Publications
Publications (65)
Cryptocurrencies are virtual currencies that exploit cryptography to perform secure financial transactions. They gained widespread popularity in recent years due to their decentralized nature, (pseudo-)anonymity, and ability to facilitate cross-border transactions without the need for intermediaries. However, their price on the market exhibits a hu...
Background
Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each diff...
The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationship...
The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed...
The massive adoption of social networks increased the need to analyze users’ data and interactions to detect and block the spread of propaganda and harassment behaviors, as well as to prevent actions influencing people towards illegal or immoral activities. In this paper, we propose HURI, a method for social network analysis that accurately classif...
The joint exploitation of data related to epidemiological, mobility, and restriction aspects of COVID-19 with machine learning algorithms can support the development of predictive models that can be used to forecast new positive cases and study the impact of more or less severe restrictions. In this work, we integrate heterogeneous data from severa...
In this paper, we formulate a hierarchical Bayesian version of the Mixture of Unigrams model for text clustering and approach its posterior inference through variational inference. We compute the explicit expression of the variational objective function for our hierarchical model under a mean-field approximation. We then derive the update equations...
The massive spread of social networks provided a plethora of new possibilities to communicate and interact worldwide. On the other hand, they introduced some negative phenomena related to social media addictions, as well as additional tools for cyberbullying and cyberterrorism activities. Therefore, monitoring operations on the posted contents and...
Smart grids are networks that distribute electricity by relying on advanced communication technologies, sensor measurements, and predictive methods, to quickly adapt the network behavior to different possible scenarios. In this context, the adoption of machine learning approaches to forecast the customer energy consumption is essential to optimize...
In an era characterized by fast technological progresses, working in the law field is very difficult if not supported by the right tools. In this paper, we present a novel method, called JPReg, that identifies paragraph regularities in legal case judgments to support legal experts during the preparation of new legal documents (i.e., paragraphs of e...
The blockchain is a disruptive technology born in the last few years, which possible applications in different domains are being extensively studied. In this context, healthcare appears to be a very attractive application domain for the blockchain because, due to its characteristics, it can provide the necessary guarantees on the secure processing,...
As the complexity of data increases, so does the importance of powerful representations, such as relational and logical representations, as well as the need for machine learning methods that can learn predictive models in such representations. A characteristic of these representations is that they give rise to a huge number of features to be consid...
In many real-world domains, data can naturally be represented as networks. This is the case of social networks, bibliographic networks, sensor networks and biological networks. Some dynamism often characterizes these networks as their structure (i.e., nodes and edges) continually evolves. Considering this dynamism is essential for analyzing these n...
Matrix tri-factorization subject to binary constraints is a versatile and powerful framework for the simultaneous clustering of observations and features, also known as biclustering. Applications for biclustering encompass the clustering of high-dimensional data and explorative data mining, where the selection of the most important features is rele...
The huge amount of data generated by sensor networks enables many potential analyses. However, one important limiting factor for the analyses of sensor data is the possible presence of anomalies, which may affect the validity of any conclusion we could draw. This aspect motivates the adoption of a preliminary anomaly detection method. Existing meth...
Motivation:
Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organisation across cell types, as well as to elucidating pathogenic processes...
In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sect...
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address...
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negati...
Smart grids are power grids where clients may actively participate in energy production, storage and distribution. Smart grid management raises several challenges, including the possible changes and evolutions in terms of energy consumption and production, that must be taken into account in order to properly regulate the energy distribution. In thi...
Gene network reconstruction is a bioinformatics task that aims at modelling the complex regulatory activities that may occur among genes. This task is typically solved by means of link prediction methods that analyze gene expression data. However, the reconstructed networks often suffer from a high amount of false positive edges, which are actually...
Background. The study of functional associations between ncRNAs and human diseases is a pivotal task of modern research to develop new and more effective therapeutic approaches. Nevertheless, it is not a trivial task since it involves entities of different types, such as microRNAs, lncRNAs or target genes whose expression also depends on endogenous...
Motivation:
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data has received increasing attention in recent years, due to its usefulness in the understanding of regulatory mechanisms involved in human diseases. Most of the existing methods reconstruct the network through machine learning approaches, by analyzing known e...
Recent developments in sensor networks and mobile computing led to a huge increase in data generated that need to be processed and analyzed efficiently. In this context, many distributed data mining algorithms have recently been proposed.
Following this line of research, we propose the DENCAST system, a novel distributed algorithm implemented in Ap...
Heterogeneous networks are networks consisting of different types of objects and links. They can be found in several fields, ranging from the Internet to social sciences, biology, epidemiology, geography and finance. Several methods have already been proposed for the analysis of network data, but they usually focus on homogeneous networks, where ob...
Heterogeneous information networks consist of different types of objects and links. They can be found in several social, economic and scientific fields, ranging from the Internet to social sciences, including biology, epidemiology, geography, finance and many others. In the literature, several clustering and classification algorithms have been prop...
The reconstruction of gene regulatory networks via link prediction methods is receiving increasing attention due to the large availability of data, mainly produced by high throughput technologies. However, the reconstructed networks often suffer from a high amount of false positive links, which are actually the result of indirect regulation activit...
Heterogeneous information networks are networks consisting of different types of objects and links, and can be found in several social, economical and scientific fields. In the literature, many clustering and classification algorithms have been proposed which work on networks, but they are usually tailored for homogeneous networks or they make stri...
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has al...
Comparison of AUROC/AUPRC results with different numbers of unlabeled examples for each sample.
(XLSX)
Graphic representation of views identified with PCA and K-means.
(PDF)
Additional details on the SynTReN and DREAM5 datasets.
(PDF)
AUROC/AUPRC results on the SynTReN and DREAM5 datasets.
(XLSX)
AUROC/AUPRC boxplots obtained with PCA-based clustering.
(PDF)
Most of works on text categorization have focused on classifying
documents into a set of categories with no relationships among them (flat
classification). However, due to the intrinsic structure that can be found in many domains, recent works are focusing on more complex tasks, such as multi-label classification, hierarchical classification and m...
The understanding of mechanisms and functions of microRNAs (miRNAs) is fundamental for the study of many biological processes and for the elucidation of the pathogenesis of many human diseases. Technological advances represented by high-throughput technologies, such as microarray and next-generation sequencing, have significantly aided miRNA resear...
Non-negative dyadic data, that is data representing observations which relate two finite sets of objects, appear in several domain applications, such as text-mining-based information retrieval, collaborative filtering and recommender systems, micro-array analysis and computer vision. Discovering latent subgroups among data is a fundamental task to...
Network reconstruction from data is a data mining task which is receiving a significant attention due to its applicability in several domains. For example, it can be applied in social network analysis, where the goal is to identify connections among the users and, thus, sub-communities. Another example can be found in computational biology, where t...
One of the recently addressed research directions focuses on the problem of mining topic evolutions from textual documents. Following this main stream of research, in this paper we face the different, but related, problem of mining the topic evolution of entities (persons, companies, etc.) mentioned in the documents. To this aim, we incrementally a...
Studying Greek and Latin cultural heritage has always been considered essential to the understanding of important aspects of the roots of current European societies. However, only a small fraction of the total production of texts from ancient Greece and Rome has survived up to the present, leaving many gaps in the historiographic records. Epigraphy...
MicroRNAs (miRNAs) are small non-coding RNAs which play a key role in the post-transcriptional regulation of many genes. Elucidating miRNA-regulated gene networks is crucial for the understanding of mechanisms and functions of miRNAs in many biological processes, such as cell proliferation, development, differentiation and cell homeostasis, as well...
Link prediction in network data is a data mining task which is receiving significant attention due to its applicability in various do- mains. An example can be found in social network analysis, where the goal is to identify connections between users. Another application can be found in computational biology, where the goal is to identify previ- ous...
Classical Greek and Latin culture is the very foundation of the identity of modern Europe. Today, a variety of modern subjects and disciplines have their roots in the classical world: from philosophy to architecture, from geometry to law. However, only a small fraction of the total production of texts from ancient Greece and Rome has survived up to...
The huge amount of data produced by the advent of Next Generation Sequencing (NGS) technologies is providing scientists with an unprecedented potential to investigate and shed light on remote secrets of genomes. We have developed a new tool based on biclustering techniques, i.e. HOCCLUS2 which is able to significantly correlate multiple miRNAs and...
Background
microRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of interactions between different miRNAs and their target genes is necessary for the understanding of miRNAs' role in the control of cell life and death. In this paper we propose a novel data mini...
Fuzzy relations are simple mathematical structures that enable a very general representation of fuzzy knowledge, and fuzzy relational calculus offers a powerful machinery for approximate reasoning. However, one of the most relevant limitations of approximate reasoning is the efficiency bottleneck. In this paper, we present two implementations for f...
MicroRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of the interactions between miRNAs and their target messenger RNAs (mRNAs) can contribute to the understanding of miRNAs' role in the control of cell life and death. In this paper we present a novel bicluste...
Motivations microRNAs (miRNAs) are post-transcriptional regulators which represent one of the major regulatory gene families in animals, plants and viruses and that plays a key role in almost all main cellular processes. The computational prediction of miRNA target genes is important for the functional annotation of genomes and, on the other side,...
microRNAs (miRNAs) are an important class of regulatory factors controlling gene expressions at post-transcriptional level. Studies on interactions between different miRNAs and their target genes are of utmost importance to understand the role of miRNAs in the control of biological processes. This paper contributes to these studies by proposing a m...
One of the recently addressed research directions focuses on the issues raised by the diffusion of highly dynamic on-line information, particularly on the problem of mining topic evolutions from news. Among several applications, risk identification and analysis may exploit mining topic evolution from news in order to support law enforcement officer...