João Gama

João Gama
University of Porto | UP · Laboratory of Artificial Intelligence and Decision Support (LIAAD)

PhD

About

513
Publications
226,285
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,191
Citations
Additional affiliations
January 2010 - present
March 2009 - present
University of Porto
Position
  • Professor (Associate)

Publications

Publications (513)
Preprint
The paper describes the Railway data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected between 2020 and 2022 that aimed to develop machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (...
Preprint
Full-text available
There has been a significant effort by the research community to address the problem of providing methods to organize documentation with the help of information Retrieval methods. In this report paper, we present several experiments with some stream analysis methods to explore streams of text documents. We use only dynamic algorithms to explore, an...
Conference Paper
This study applies a data-driven anomaly detection framework based on a Long Short-Term Memory (LSTM) autoencoder network for several subsystems of a public transport bus. The proposed framework efficiently detects abnormal data, significantly reducing the false alarm rate compared to available alternatives. Using historical repair records, we demo...
Article
Full-text available
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of co...
Chapter
This article presents our recent work on the topic of learning from data streams. We focus on emerging topics, including fraud detection, learning from rare cases, and hyper-parameter tuning for streaming data.
Article
Full-text available
The increased development of urban areas results in a larger number of vehicles on the road network, leading to traffic congestion, which often leads to potentially dangerous situations that can be described as anomalies. The tensor-based methods emerged only recently in applications related to traffic anomaly detection. They outperform other model...
Article
Full-text available
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the T...
Article
Full-text available
In recent years data stream mining and learning from imbalanced data have been active research areas. Even though solutions exist to tackle these two problems, most of them are not designed to handle challenges inherited from both problems. As far as we are aware, the few approaches in the area of learning from imbalanced data streams fall in the c...
Preprint
Full-text available
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical...
Article
Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms te...
Conference Paper
Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribu...
Chapter
We present an online optimization method for time-evolving data streams that can automatically adapt the hyper-parameters of an embedding model. More specifically, we employ the Nelder-Mead algorithm, which uses a set of heuristics to produce and exploit several potentially good configurations, from which the best one is selected and deployed. This...
Article
Full-text available
In the last few years, many works have addressed Predictive Maintenance (PdM) by the use of Machine Learning (ML) and Deep Learning (DL) solutions, especially the latter. The monitoring and logging of industrial equipment events, like temporal behavior and fault events-anomaly detection in time-series-can be obtained from records generated by senso...
Chapter
One of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal discovery algorithms have appeared that do not fall into this category. In this paper, we present a new algo...
Chapter
This work aims to develop a Machine Learning framework to predict voting behaviour. Data resulted from longitudinally collected variables during the Portuguese 2019 general election campaign. Naïve Bayes (NB), and Tree Augmented Naïve Bayes (TAN) and three different expert models using Dynamic Bayesian Networks (DBN) predict voting behaviour system...
Chapter
Topic modeling or inference has been one of the well-known problems in the area of text mining. It deals with the automatic categorisation of words or documents into similarity groups also known as topics. In most of the social media platforms such as Twitter, Instagram, and Facebook, hashtags are used to define the content of posts. Therefore, mod...
Article
Full-text available
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of exp...
Article
Full-text available
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principle...
Article
Full-text available
Densification events in time-evolving networks refer to instants in which the network density, that is, the number of edges, is substantially larger than in the remaining. These events can occur at a global level, involving the majority of the nodes in the network, or at a local level involving only a subset of nodes.While global densification even...
Article
Full-text available
The significant growth of interconnected Internet‐of‐Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In p...
Article
Full-text available
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexit...
Article
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. The online processing of these data streams therefore requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their ba...
Book
Há algum tempo, a área de inteligência artificial deixou de ser vista apenas como teórica – destinada à aplicação em pequenos problemas “curiosos” – para se tornar um campo de pesquisa crescente, em busca de soluções de problemas reais da sociedade. Vencedor do Prêmio Jabuti 2012 (Categoria Tecnologia e Informática) quando foi lançado, Inteligênci...
Preprint
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessmen...
Chapter
We provide a framework for analyzing geographical influence networks that have impacts on visit event sequences for a set of point-of-interests (POIs) in a city. Since mutually-exciting Hawkes processes can naturally model temporal event data and capture interactions between those events, previous work presented a probabilistic model based on Hawke...
Article
Probabilistic forecasting of distribution tails (i.e., quantiles below 0.05 and above 0.95) is challenging for non-parametric approaches since data for extreme events are scarce. A poor forecast of extreme quantiles can have a high impact in various power system decision-aid problems. An alternative approach more robust to data sparsity is extreme...
Book
This book constitutes the proceedings of the 19th International Symposium on Intelligent Data Analysis, IDA 2021, which was planned to take place in Porto, Portugal. Due to the COVID-19 pandemic the conference was held online during April 26-28, 2021. The 35 papers included in this book were carefully reviewed and selected from 113 submissions. The...
Conference Paper
Full-text available
The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, th...
Article
Full-text available
Many aspects of our lives are associated with places and the activities we perform on a daily basis. Most of them are recurrent and demand displacement of the individual between regular places like going to work, school or other important personal locations. To accomplish these recurrent daily activities, people tend to follow regular paths with si...
Conference Paper
Full-text available
The increased development of the urban areas consequently results in a larger number of vehicles on the road network, leading to traffic congestion, especially in the rush hours. Intelligent Transport Systems (ITS) solutions present the applications that can be useful in detecting and dealing with the problems that are related to congestion. This p...
Article
Tensor decompositions are multi-way analysis tools which have been successfully applied in a wide range of different fields. However, there are still challenges that remain few explored, namely the following: when applying tensor decomposition techniques, what should we expect from the result? How can we evaluate its quality? It is expected that, w...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Article
Full-text available
This paper provides a comprehensive state-of-the-art investigation of the recent advances in data science in emerging economic applications. The analysis is performed on the novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains in...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
Full-text available
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Chapter
Full-text available
Tensor-based models emerged only recently in modeling and analysis of the spatiotemporal road traffic data. They outperform other data models regarding the property of simultaneously capturing both spatial and temporal components of the observed traffic dataset. In this paper, the nonnegative tensor decomposition method is used to extract traffic p...
Chapter
Since early 2000, Microfinance Institutions (MFI) have been using credit scoring for their risk assessment. However, one of the main problems of credit scoring in microfinance is the lack of structured financial data. To address this problem, MFI have started using non-traditional data which can be extracted from the digital footprint of their user...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse ran...
Preprint
Full-text available
This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing...
Preprint
Full-text available
This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing...
Article
The high asymmetry of international termination rates is fertile ground for the appearance of fraud in telecom companies. International calls have higher values when compared with national ones, which raises the attention of fraudsters. In this paper, we present a solution for a real problem called interconnect bypass fraud, more specifically, a ne...
Article
Full-text available
The increasing presence of renewable energy plants has created new challenges such as grid integration, load balancing and energy trading, making it fundamental to provide effective prediction models. Recent approaches in the literature have shown that exploiting spatio-temporal autocorrelation in data coming from multiple plants can lead to better...
Article
Human mobility patterns are associated with many aspects of our life. With the increase of the popularity and pervasiveness of smartphones and portable devices, the Internet of Things (IoT) is turning into a permanent part of our daily routines. Positioning technologies that serve these devices such as the cellular antenna (GSM networks), global na...
Article
Nowadays, fraudsters are continually trying to explore technical gaps in telecom companies to get some profit. The high cost of international termination rates in Telecom Companies, and mainly because of their high asymmetry property, attracts the attention of fraudsters. In this paper, we explore the application of three deterministic algorithms a...
Article
Full-text available
Fraud in telephony incurs huge revenue losses and causes a menace to both the service providers and legitimate users. This problem is growing alongside augmenting technologies. Yet, the works in this area are hindered by the availability of data and confidentiality of approaches. In this work, we deal with the problem of detecting different types o...
Chapter
Full-text available
This paper analyses the impact of trust and reputation modelling on CloudAnchor, a business-to-business brokerage platform for the transaction of single and federated resources on behalf of Small and Medium Sized Enterprises (SME). In CloudAnchor, businesses act as providers or consumers of Infrastructure as a Service (IaaS) resources. The platform...
Article
Full-text available
Future cities of the Global South will not only rapidly urbanise but will also get warmer from climate change and urbanisation induced effects. It will trigger a multi-fold increase in cooling demand, especially at a residential level, mitigation to which remains a policy and research gap. This study forwards a novel residential energy stress mitig...
Chapter
Full-text available
The application of feature engineering in classification problems has been commonly used as a means to increase the classification algorithms performance. There are already many methods for constructing features, based on the combination of attributes but, to the best of our knowledge, none of these methods takes into account a particular character...