About
43
Publications
18,019
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
301
Citations
Introduction
Postdoc @ LIAAD - INESC Tec and Invited Professor @ Sciences College, Oporto University.
Additional affiliations
Education
March 2013 - May 2017
September 2009 - November 2012
September 2005 - September 2009
Publications
Publications (43)
Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Secon...
Time series forecasting is a challenging task, where the non-stationary characteristics of the data portrays a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some intervals are very important to the user but severely underrepresented. Standard regression tools focus on the average beha...
Time series forecasting is a challenging task, where the non-stationary characteristics of data portrays a hard setting for predictive tasks. A common issue is the imbalanced distribution of the target variable, where some values are very important to the user but severely underrepresented. Standard prediction tools focus on the average behaviour o...
Supervised learning with imbalanced domains is one of the biggest challenges in machine learning. Such tasks differ from standard learning tasks by assuming a skewed distribution of target variables, and user domain preference towards under-represented cases. Most research has focused on imbalanced classification tasks, where a wide range of soluti...
With the profusion of web content, researchers have avidly studied and proposed new approaches to enable the anticipation of its impact on social media, presenting many distinct approaches throughout the last decade. Diverse approaches have been presented to tackle the problem of web content popularity prediction, including standard classification...
As artificial intelligence becomes more pervasive, explainability and the need to interpret machine learning models’ behavior emerge as critical issues. Discussions are usually bounded by those who defend that interpretable models must be the rule or that non-interpretable models’ ability to capture more complex patterns warrants their use. In this...
As the frontier of machine learning applications moves further into human interaction, multiple concerns arise regarding automated decision-making. Two of the most critical issues are fairness and data privacy. On the one hand, one must guarantee that automated decisions are not biased against certain groups, especially those unprotected or margina...
The rapid advancement in data-driven research has increased the demand for effective graph data analysis. However, real-world data often exhibits class imbalance, leading to poor performance of machine learning models. To overcome this challenge, class-imbalanced learning on graphs (CILG) has emerged as a promising solution that combines the streng...
The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals’ privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-ident...
We can protect user data privacy via many approaches, such as statistical transformation or generative models. However, each of them has critical drawbacks. On the one hand, creating a transformed data set using conventional techniques is highly time-consuming. On the other hand, in addition to long training phases, recent deep learning-based solut...
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons...
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain. Research in this field has been mainly focused on classification tasks. Comparatively, the number of studies carried out in the context of regression tasks is negligible. One of the main reasons...
The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals' privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-ident...
Machine learning is increasingly used in the most diverse applications and domains, whether in healthcare, to predict pathologies, or in the financial sector to detect fraud. One of the linchpins for efficiency and accuracy in machine learning is data utility. However, when it contains personal information, full access may be restricted due to laws...
Extreme and rare events, such as spikes in air pollution or abnormal weather conditions, can have serious repercussions. Many of these sorts of events develop through spatio-temporal processes. Timely and accurate predictions are a most valuable tool in addressing their impact. We propose a new set of resampling strategies for imbalanced spatio-tem...
Faced with the emergence of the Covid-19 pandemic, and to better understand and contain the disease’s spread, health organisations increased the collaboration with other organisations sharing health data with data scientists and researchers. Data analysis assists such organisations in providing information that could help in decision-making process...
The No Free Lunch (NFL) theorems have sparked intense debate since their publication, from theoretical and practical perspectives. However, to this date, no discussion is provided concerning its impact in the established field of imbalanced domain learning (IDL), known for its challenges regarding learning and evaluation processes. Most importantly...
Privacy-preservation has become an essential concern in many data mining applications since the emergence of legal obligations to protect personal data. Thus, the notion of Privacy-Preserving Data Mining emerged to allow the extraction of knowledge from data without violating the privacy of individuals. Several transformation techniques have been p...
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statis...
Imbalanced learning is one of the most relevant problems in machine learning. However, it faces two crucial challenges. First, the amount of methods proposed to deal with such problem has grown immensely, making the validation of a large set of methods impractical. Second, it requires specialised knowledge, hindering its use by those without such l...
In today’s software industry, systems are constantly changing. To maintain their quality and to prevent failures at controlled costs is a challenge. One way to foster quality is through thorough and systematic testing. Therefore, the definition of adequate tests is crucial for saving time, cost and effort. This paper presents a framework that gener...
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statis...
This paper presents a data set describing the evolution of results in the Portuguese Parliamentary Elections of October 6$^{th}$ 2019. The data spans a time interval of 4 hours and 25 minutes, in intervals of 5 minutes, concerning the results of the 27 parties involved in the electoral event. The data set is tailored for predictive modelling tasks,...
[ FULL-TEXT FREELY AVAILABLE AT: https://www.dcc.fc.up.pt/~moliveira/pdf/19DSAA_BiasedResampling_postPrint.pdf ] Extreme and rare events, such as abnormal spikes in air pollution or weather conditions can have serious repercussions. Many of these sorts of events develop from spatio-temporal processes, and accurate predictions are a most valuable to...
While the predictive advantage of ensemble methods is nowadays widely accepted, the most appropriate way of estimating the weights of each individual model remains an open research question. Meanwhile, several studies report that combining different ensemble approaches leads to improvements in performance, due to a better trade-off between the dive...
The ability to generate and share content in social media platforms has changed the Internet. With a growing rate of content generation , efforts have been directed at making sense of such data. One of the most researched problem concerns predicting web content popularity. We argue that the evolution of state-of-the-art approaches has been optimize...
The profusion of user generated content caused by the rise of social media platforms has enabled a surge in research relating to fields such as information retrieval, recommender systems, data mining and machine learning. However, the lack of comprehensive baseline data sets to allow a thorough evaluative comparison has become an important issue. I...
Ensemble methods are well known for providing an advantage over single models in a large range of data mining and machine learning tasks. Their benefits are commonly associated to the ability of reducing the bias and/or variance in learning tasks. Ensembles have been studied both for classification and regression tasks with uniform domain preferenc...
This thesis addresses prediction and ranking tasks using web content data. The main objective is to improve the ability to accurately predict and rank recent and highly popular content, thus enabling a faster and more precise recommendation of such items. The main motivation relates to the profusion of online content, and the increasing demand of u...
Social media is rapidly becoming the main source of news consumption for users, raising significant challenges to news aggregation and recommendation tasks. One of these challenges concerns the recommendation of very recent news. To tackle this problem, approaches to the prediction of news popularity have been proposed. In this paper we study the t...
Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available. This may pose a numerical scaling problem, causing an under- or over-estimation of the evaluation depending on th...
Thousands of news are published everyday reporting worldwide events. Most of these news obtain a low level of popularity and only a small set of events become highly popular in social media platforms. Predicting rare cases of highly popular news is not a trivial task due to shortcomings of standard learning approaches and evaluation metrics. So far...
The process of decision making in humans involves a combination of the genuine information held by the individual , and the external influence from their social network connections. This helps individuals to make decisions or adopt behaviors, opinions or products. In this work, we seek to investigate under which conditions and with what cost we can...
The Portuguese governmental network comprising all the 776 ministers and junior ministers who were part of the 19 governments between the year 1976 and 2013 is presented and analysed. The data contains information on connections concerning business and other types of organizations and, to our knowledge, there is no such extensive research in previo...
This article discusses the relationships established between capital owners and the groups of rulers and former rulers, embracing a critical perspective capable of enhancing the State’s role in the definition of economic power. Special attention is given to the cooptation process, an analysis that includes data on 776 rulers who occupied 1281 posit...
A data set containing information on the explicit connections concerning all members of Portuguese governments from 1976 until July 2013 is presented. This information was collected through a one-year research carried out by the authors using public records and official information (public and private institutions). The data set was collected durin...
The question of how to recommend and manage information available in the
Internet is an active area of research, namely concerning news recommendations.
The methods used to produce news rankings by the most popular recommender
systems are not public and it is unclear if they reflect the real importance
assigned by the readers. Also, the latency per...
The methods used to produce news rankings by recommender systems are not public and it is unclear if they reflect the real importance assigned by readers. We address the task of trying to forecast the number of times a news item will be tweeted, as a proxy for the importance assigned by its readers. We focus on methods for accurately forecasting wh...
The use of Internet to search for information regarding a variety of topics of interest to users is still growing. The issue of how to manage and recommend that information is still an open area of research, namely concerning news recommendations. In this paper we use a recently developed approach to predict extreme and rare values of a variable to...
This paper presents an approach for text processing of PDF documents with well-defined layout structure. The scope of the approach is to explore the font’s structure of PDF documents, using perceptual grouping. It consists on the extraction of text objects from the content stream of the documents and its grouping according to a set criterion, makin...