Ruggero G. Pensa

Ruggero G. Pensa
Università degli Studi di Torino | UNITO · Dipartimento di Informatica

Associate Professor

About

93
Publications
18,072
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,200
Citations
Citations since 2017
39 Research Items
690 Citations
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
Introduction
My major research interests regard data mining and knowledge discovery, bioinformatics (gene expression data analysis), privacy-preserving algorithms for data management and social network analysis.
Additional affiliations
December 2011 - December 2011
Università degli Studi di Torino
Position
  • Research Associate
December 2011 - present
Università degli Studi di Torino
Position
  • Professor (Assistant)
November 2010 - October 2011
Italian National Research Council
Position
  • Research Associate
Education
November 2003 - November 2006
October 1998 - November 2003
Politecnico di Torino
Field of study
  • Computer Engineering

Publications

Publications (93)
Article
Full-text available
User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open...
Conference Paper
Full-text available
Semi-supervised learning is crucial in many applications where accessing class labels is unaffordable or costly. The most promising approaches are graph-based but they are transductive and they do not provide a generalized model working on inductive scenarios. To address this problem, we propose a generic framework for inductive semi-supervised lea...
Conference Paper
Full-text available
Distance-based machine learning methods have limited applicability to categorical data, since they do not capture the complexity of the relationships among different values of a categorical attribute. Nonetheless, categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Although...
Article
Full-text available
For their stability and detectability faecal microRNAs represent promising molecules with potential clinical interest as non-invasive diagnostic and prognostic biomarkers. However, there is no evidence on how stool miRNA profiles change according to an individual’s age, sex, and body mass index (BMI) or how lifestyle habits influence the expression...
Article
Full-text available
Most privacy-preserving machine learning methods are designed around continuous or numeric data, but categorical attributes are common in many application scenarios, including clinical and health records, census and survey data. Distance-based methods, in particular, have limited applicability to categorical data, since they do not capture the comp...
Article
Full-text available
Semi-supervised learning is crucial in many applications where accessing class labels is unaffordable or costly. The most promising approaches are graph-based but they are transductive and they do not provide a generalized model working on inductive scenarios. To address this problem, we propose a generic framework, ESA☆, for inductive semi-supervi...
Article
Full-text available
Objectives: MicroRNA (miRNA) profiles have been evaluated in several biospecimens in relation to common diseases for which diet may have a considerable impact. We aimed at characterising how specific diets are associated with the miRNome in stool of vegans, vegetarians and omnivores and how this is reflected in the gut microbial composition, as thi...
Article
Full-text available
The majority of the data produced by human activities and modern cyber-physical systems involve complex relations among their features. Such relations can be often represented by means of tensors, which can be viewed as generalization of matrices and, as such, can be analyzed by using higher-order extensions of existing machine learning methods, su...
Article
Full-text available
Satellite image time series (SITS) collected by modern Earth Observation (EO) systems represent a valuable source of information that supports several tasks related to the monitoring of the Earth surface dynamics over large areas. A main challenge is then to design methods able to leverage the complementarity between the temporal dynamics and the s...
Article
Full-text available
Contagion processes have been widely studied in epidemiology and life science in general, but their implications are largely tangible in other research areas, such as in network science and computational social science. Contagion models, in particular, have proven helpful in the study of information diffusion, a very topical issue thanks to its app...
Chapter
Full-text available
With the availability of user-generated content in the Web, malicious users dispose of huge repositories of private (and often sensitive) information regarding a large part of the world’s population. The self-disclosure of personal information, in the form of text, pictures and videos, exposes the authors of such contents (and not only them) to man...
Article
Full-text available
Semi-supervised learning is a family of classification methods conceived to reduce the amount of required labeled information in the training phase. Graph-based methods are among the most popular semi-supervised strategies: a nearest neighbor graph is built in such a way that the manifold of the data is captured and the labeled information is propa...
Article
Full-text available
The quality of the transport system offered at city level constitutes an important and challenging goal for society, for local authorities, and transport operators. Therefore, appropriate evaluation of travellers' satisfaction is required to support service performance monitoring, benchmarking, and market analysis. This aspect implies the collectio...
Chapter
Full-text available
In most real world scenarios, experts dispose of limited background knowledge that they can exploit for guiding the analysis process. In this context, semi-supervised clustering can be employed to leverage such knowledge and enable the discovery of clusters that meet the analysts’ expectations. To this end, we propose a semi-supervised deep embeddi...
Chapter
Full-text available
Tensors co-clustering has been proven useful in many applications, due to its ability of coping with high-dimensional data and sparsity. However, setting up a co-clustering algorithm properly requires the specification of the desired number of clusters for each mode as input parameters. This choice is already difficult in relatively easy settings,...
Presentation
Introduzione:I dati ISTAT (2018) riportano che l’86% dei minori, tra 11 e 14 anni, utilizza internet, il 5% in più rispetto al 2014. I Nativi Digitali rappresentano la nuova generazione di studenti, cresciuti con dispositivi digitali che permettono la connessione in rete in ogni momento della giornata. L’utilizzo delle nuove tecnologie e la maggior...
Preprint
In most real world scenarios, experts dispose of limited background knowledge that they can exploit for guiding the analysis process. In this context, semi-supervised clustering can be employed to leverage such knowledge and enable the discovery of clusters that meet the analysts' expectations. To this end, we propose a semi-supervised deep embeddi...
Conference Paper
In this paper we point out some relevant issues in relation to privacy when providing holistic recommendations. We emphasize that a holistic recommender should be fair, explainable and privacy-preserving to ensure the ethicality of the recommendation process. Further, we point out relevant research questions that should be addressed in the future,...
Article
Full-text available
Online social networks expose their users to privacy leakage risks. To measure the risk, privacy scores can be computed to quantify the users' profile exposure according to their privacy preferences or attitude. However, user privacy can be also influenced by external factors (e.g., the relative risk of the network, the position of the user within...
Book
This book constitutes revised selected papers from two workshops held at the 18th European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2018, in Dublin, Ireland, in September 2018, namely: MIDAS 2018 – Third Workshop on Mining Data for Financial Applications and PAP 2018 – Second International Workshop on Personal...
Article
Full-text available
Modern Earth Observation systems provide remote sensing data at different temporal and spatial resolutions. Among all the available spatial mission, today the Sentinel-2 program supplies high temporal (every five days) and high spatial resolution (HSR) (10 m) images that can be useful to monitor land cover dynamics. On the other hand, very HSR (VHS...
Article
Full-text available
The success of a film is usually measured through its box-office revenue or through the opinion of professional critics; such measures, however, may be influenced by external factors, such as advertisement or trends, and are not able to capture the impact of a film over time. Thanks to the recent availability of data on references among movies, som...
Article
Full-text available
In this paper, we address the problem of enhancing young people's awareness of the mechanisms involving privacy in online social networks by presenting an innovative approach based on gamification. In particular, we propose a web application that allows kids and teenagers to experience the typical dynamics of information spread through a realistic...
Conference Paper
Full-text available
The success of a movie is usually measured through its box-office revenue or the opinion of professional critics, but such measures may be influenced by external factors, such as advertisement or trends, and are not able to capture the impact over time of a film. A more efficient measure should account to what extent a given movie has influenced ot...
Article
Full-text available
Public participation has become an important driver in increasing public acceptance of policy decisions, especially in the forestry sector, where conflicting interests among the actors are frequent. Stakeholder Analysis, complemented by Social Network Analysis techniques, was used to support the participatory process and to understand the complex r...
Preprint
Full-text available
Modern Earth Observation systems provide sensing data at different temporal and spatial resolutions. Among optical sensors, today the Sentinel-2 program supplies high-resolution temporal (every 5 days) and high spatial resolution (10m) images that can be useful to monitor land cover dynamics. On the other hand, Very High Spatial Resolution images (...
Article
Full-text available
Modern Earth Observation systems provide sensing data at different temporal and spatial resolutions. Among optical sensors, today the Sentinel-2 program supplies high-resolution temporal (every 5 days) and high spatial resolution (10m) images that can be useful to monitor land cover dynamics. On the other hand, Very High Spatial Resolution images (...
Article
Full-text available
The problem of user privacy enforcement in online social networks (OSN) cannot be ignored and, in recent years, Facebook and other providers have improved considerably their privacy protection tools. However, in OSN’s the most powerful data protection “weapons” are the users themselves. The behavior of an individual acting in an OSN highly depends...
Conference Paper
Full-text available
The problem of user privacy enforcement in online social networks (OSN) cannot be ignored and, in recent years, Facebook and other providers have improved considerably their privacy protection tools. However, in OSN’s the most powerful data protection “weapons” are the users themselves. The behavior of an individual acting in an OSN highly depends...
Conference Paper
Full-text available
Information diffusion is a widely-studied topic thanks to its applications to social media/network analysis, viral marketing campaigns, influence maximization and prediction. In bibliographic networks, for instance, an information diffusion process takes place when some authors, that publish papers in a given topic, influence some of their neighbor...
Conference Paper
Full-text available
The maturity of structured knowledge bases and semantic resources has contributed to the enhancement of document clustering algorithms, that may take advantage of conceptual representations as an alternative for classic bag-of-words models. However, operating in the semantic space is not always the best choice in those domain where the choice of te...
Article
Full-text available
During our digital social life, we share terabytes of information that can potentially reveal private facts and personality traits to unexpected strangers. Despite the research efforts aiming at providing efficient solutions for the anonymization of huge databases (including networked data), in online social networks the most powerful privacy prote...
Article
Full-text available
In this paper, we introduce a new approach of semisupervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationship among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model that c...
Conference Paper
Full-text available
Humans like to disseminate ideas and news, as proved by the huge success of online social networking platforms such as Facebook or Twitter. On the other hand, these platforms have emphasized the dark side of information spreading, such as the diffusion of private facts and rumors in the society. Fortunately, in some cases, online social network use...
Conference Paper
Full-text available
During our digital social life, we share terabytes of information that can potentially reveal private facts and personality traits to unexpected strangers. Despite the research efforts aiming at providing efficient solutions for the anonymization of huge databases (including networked data), in online social networks the most powerful privacy prote...
Conference Paper
Full-text available
The risks due to a global and unaware diffusion of our personal data cannot be overlooked when more than two billion people are estimated to be registered in at least one of the most popular online social networks. As a consequence, privacy has become a primary concern among social network analysts and Web/data scientists. Some studies propose to "...
Article
Full-text available
The way we watch television is changing with the introduction of attractive Web activities that move users away from TV to other media. The social multimedia and user-generated contents are dramatically changing all phases of the value chain of contents (production, distribution and consumption). We propose a concept-level integration framework in...
Article
Location-based social networks (LBSN) are capturing large amount of data related to whereabouts of their users. This has become a social phenomenon, that is changing the normal communication means and it opens new research perspectives on how to compute descriptive models out of this collection of geo-spatial data. In this paper, we propose a metho...
Article
Full-text available
The valorization and promotion of worldwide Cultural Heritage by the adoption of Information and Communication Technologies represent nowadays some of the most important research issues with a large variety of potential applications. This challenge is particularly perceived in the Italian scenario, where the artistic patrimony is one of the most di...
Article
In common binary classification scenarios, the presence of both positive and negative examples in training data is needed to build an efficient classifier. Unfortunately, in many domains, this requirement is not satisfied and only one class of examples is available. To cope with this setting, classification algorithms have been introduced that lear...
Chapter
In the last decade, the spread of broadband Internet connections even for mobile devices has contributed to an increased availability of multimedia information on the Web. At the same time, due to the decrease of storage cost and the increasing popularity of storage services in the cloud, the problem of information overload has become extremely ser...
Article
Full-text available
The increasing availability of gene expression data has encouraged the development of purposely-built intelligent data analysis techniques. Grouping genes characterized by similar expression patterns is a widely accepted - and often mandatory - analysis step. Despite the fact that a number of biclustering methods have been developed to discover clu...
Article
Full-text available
The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-...
Conference Paper
Full-text available
In this paper, we present a research prototype for creating geographic summaries using the whereabouts of Foursquare users. Exploiting the density of the venue types in a particular region, the system adds a layer over any typical cartography geographic maps service, creating a first glance summary over the venues sampled from the Foursquare knowle...
Article
In this work, we present a general framework for Cultural Heritage applications able to uniformly manage heterogeneous multimedia data coming from several web repositories and to provide context- Aware recommendation services in order to generate dynamic multimedia visiting paths useful for the users during the exploration of different kinds of cul...
Chapter
In this chapter the authors propose a new methodology that minimizes the intervention of the analyst within the coclustering process and that provides meaningful coclusters whose discovery and interpretation are enhanced by embedding gene ontology (GO) annotations. To show the effectiveness of this approach, the authors apply their methodology on a...
Conference Paper
Italy’s Cultural Heritage is the world’s most diverse and rich patrimony and attracts millions of visitors every year to monuments, archaeological sites and museums. The valorization of cultural heritage represents nowadays one of the most important research challenges in the Italian scenario. In this paper, we present a general multimedia recommen...
Conference Paper
Full-text available
People on the Web talk about television. TV users' social activities implicitly connect the concepts referred to by videos, news, comments, and posts. The strength of such connections may change as the perception of users on the Web changes over time. With the goal of leveraging users' social activities to better understand how TV programs are perc...
Conference Paper
Full-text available
Searching, browsing and analyzing web contents is today a challenging problem when compared to early Internet ages. This is due to the fact that web content is multimedial, social and dynamic. Moreover, concepts referred by videos, news, comments, posts, are implicitly linked by the fact that people on the Web talks about something, somewhere at so...
Article
Full-text available
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these da...
Article
Full-text available
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered. In this article, we propose a framework to learn a context-based distance for categorical attri...
Article
Full-text available
Clustering data is challenging especially for two reasons. The dimensionality of the data is often very high which makes the cluster interpretation hard. Moreover, with high-dimensional data the classic metrics fail in identifying the real similarities between objects. The second challenge is the evolving nature of the observed phenomena which make...
Article
In the generic setting of objects × attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many cons...
Conference Paper
Full-text available
The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patt...
Conference Paper
Full-text available
Clustering high-dimensional data is challenging. Classic met- rics fail in identifying real similarities between objects. Moreover, the huge number of features makes the cluster interpretation hard. To tackle these problems, several co-clustering approaches have been proposed which try to compute a partition of objects and a partition of features s...
Conference Paper
Full-text available
Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance between pairs of values of the same categorical attribute, since they are not ordered. In this paper, we propose a method to learn a context-based distance for categorical attribute...
Conference Paper
Full-text available
Today digital bibliographies are a powerful instrument that collects a great amount of data about scientific publications. Digital bibliographies have been used as basis of many studies focused on the knowledge extraction in databases. Here we present anew methodology for mining knowledge in this field. Our approach aims to apply the potential of s...
Conference Paper
The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-...
Article
Full-text available
There is an increasing need in transcriptome research for gene expression data and pattern warehouses. It is of importance to integrate in these warehouses both raw transcriptomic data, as well as some properties encoded in these data, like local patterns. We have developed an application called SQUAT (SAGE Querying and Analysis Tools) which is ava...
Data
SQUAT relational schema. This figures displays the tables and the relation between the table of the SQUAT database.
Article
Full-text available
We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level constraints (must-link, cannot-link) to express relationsh...
Conference Paper
Full-text available
In many applications, the expert interpretation of co- clustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a col- lection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained cluster...