João Gama

João Gama
University of Porto | UP · Laboratory of Artificial Intelligence and Decision Support (LIAAD)

PhD

About

569
Publications
316,565
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,809
Citations
Additional affiliations
January 2010 - present
March 2009 - present
University of Porto
Position
  • Professor (Associate)

Publications

Publications (569)
Article
Full-text available
To maintain the performance of the latest generation of onshore and offshore wind turbine systems, a new methodology must be proposed to enhance the maintenance policy. In this context, this paper introduces an approach to designing a decision support tool that combines predictive capabilities with anomaly explanations for effective IoT predictive...
Chapter
Full-text available
Image segmentation for detecting illegal landfill waste in aerial images is essential for environmental crime monitoring. Despite advancements in segmentation models, the primary challenge in this domain is the lack of annotated data due to the unknown locations of illegal waste disposals. This work mainly focuses on evaluating segmentation models...
Chapter
Full-text available
Effective anomaly detection in telecommunication networks is essential for securing digital transactions and supporting the sustainability of our global information ecosystem. However, the volume of data in such high-speed distributed environments imposes strict latency and scalability requirements on anomaly detection systems. This study focuses o...
Preprint
Full-text available
Predictive Maintenance (PdM) emerged as one of the pillars of Industry 4.0, and became crucial for enhancing operational efficiency, allowing to minimize downtime, extend lifespan of equipment, and prevent failures. A wide range of PdM tasks can be performed using Artificial Intelligence (AI) methods, which often use data generated from industrial...
Article
Full-text available
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical...
Article
Full-text available
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitra...
Chapter
Over the past few decades, road transportation emissions have increased. Vehicles are among the most significant sources of pollutants in urban areas. As such, several studies and public policies emerged to address the issue. Estimating greenhouse emissions and air quality over space and time is crucial for human health and mitigating climate chang...
Chapter
E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering...
Chapter
Trajectory clustering is one of the most important issues in mobility patterns data mining. It is applied in several cases such as hot-spots detection, urban transportation control, animal migration movements, and tourist visiting routes among others. In this paper, we describe how to identify the most frequent trajectories from raw GPS data. By ma...
Chapter
Full-text available
The transition to Industry 4.0 provoked a transformation of industrial manufacturing with a significant leap in automation and intelligent systems. This paradigm shift has brought about a mindset that emphasizes predictive maintenance: detecting future failures when current behaviour of industrial processes and machines is thought to be normal. The...
Preprint
Full-text available
Explainable Artificial Intelligence (XAI) fills the role of a critical interface fostering interactions between sophisticated intelligent systems and diverse individuals, including data scientists, domain experts, end-users, and more. It aids in deciphering the intricate internal mechanisms of ``black box'' Machine Learning (ML), rendering the reas...
Article
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions ba...
Article
Full-text available
The human brain works in such a complex way that we have not yet managed to decipher its functional mysteries. It has five main channels that act as information input: the senses. Sight, hearing, taste, smell, and touch generate information that flows from their corresponding receptors, i.e., the eyes, ears, tongue, nose, and skin, that help us und...
Chapter
Full-text available
As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper prop...
Preprint
Full-text available
In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and...
Chapter
Water is a fundamental human resource and its scarcity is reflected in social, economic and environmental problems. Water used in human activities must be treated before reusing or returning to nature. This treatment takes place in wastewater treatment plants (WWTPs), which need to perform their functions with high quality, low cost, and reduced en...
Chapter
Road transportation emissions have increased in the last few decades and have been the primary source of pollutants in urban areas with ever-growing populations. In this context, it is important to have effective measures to monitor road emissions in regions. Creating an emission inventory over a region that can map the road emission based on the v...
Chapter
The demand for high-performance solutions for anomaly detection and forecasting fault events is increasing in the industrial area. The detection and forecasting faults from time-series data are one critical mission in the Internet of Things (IoT) data mining. The classical fault detection approaches based on physical modelling are limited to some m...
Chapter
The growing use of data-driven decision systems based on Artificial Intelligence (AI) by governments, companies and social organizations has given more attention to the challenges they pose to society. Over the last few years, news about discrimination appeared on social media, and privacy, among others, highlighted their vulnerabilities. Despite a...
Chapter
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black-box models are popular approaches due to their predictive accuracy and are based on deep-learning techniques. This paper presents an architecture that uses an online rule learning algorithm to explain when the black-box model predicts rare...
Chapter
An online data-driven predictive maintenance approach for railway switches using data logs obtained from the interlocking system of the railway infrastructure is proposed in this paper. The proposed approach is detailed described and consists of a two-phase process: anomaly detection and remaining useful life prediction. The approach is applied to...
Article
Full-text available
Federated learning (FL) is a collaborative, decentralized privacy‐preserving method to attach the challenges of storing data and data privacy. Artificial intelligence, machine learning, smart devices, and deep learning have strongly marked the last years. Two challenges arose in data science as a result. First, the regulation protected the data by...
Article
Full-text available
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumpti...
Article
Full-text available
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Article
Full-text available
Influence Analysis is one of the well‐known areas of Social Network Analysis. However, discovering influencers from micro‐blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for...
Preprint
Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes...
Conference Paper
An online data-driven predictive maintenance approach for railway switches using data logs obtained from the interlocking system of the railway infrastructure is proposed in this paper. The proposed approach is detailed described and consists of a two-phase process: anomaly detection and remaining useful life prediction. The approach is applied to...
Chapter
Analyzing the way individuals move is fundamental to understand the dynamics of humanity. Transportation mode plays a significant role in human behavior as it changes how individuals travel, how far, and how often they can move. The identification of transportation modes can be used in many applications and it is a key component of the internet of...
Chapter
In hospital and after ICU discharge deaths are usual, given the severity of the condition under which many of them are admitted to these wings. Because of this, there is an urge to identify and follow these cases closely. Furthermore, as ICU data is usually composed of variables measured in varying time intervals, there is a need for a method that...
Preprint
Full-text available
The paper describes the Railway data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected between 2020 and 2022 that aimed to develop machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (...
Article
With the growth of 5G networks, the Internet-of-Things is becoming a reality. The advances in networking, machine learning, data analytics, and robotics largely improve industrial processes. Industry 4.0 is a term for the fourth industrial revolution: the digitization and automation of manufacturing.1 Predictive maintenance is one of the techniques...
Preprint
Full-text available
There has been a significant effort by the research community to address the problem of providing methods to organize documentation with the help of information Retrieval methods. In this report paper, we present several experiments with some stream analysis methods to explore streams of text documents. We use only dynamic algorithms to explore, an...
Conference Paper
This study applies a data-driven anomaly detection framework based on a Long Short-Term Memory (LSTM) autoencoder network for several subsystems of a public transport bus. The proposed framework efficiently detects abnormal data, significantly reducing the false alarm rate compared to available alternatives. Using historical repair records, we demo...
Chapter
Non-traditional data like the applicant’s bank statement is a significant source for decision-making when granting loans. We find that we can use methods from network science on the applicant’s bank statements to convert inherent cash flow characteristics to predictors for default prediction in a credit scoring or credit risk assessment model. Firs...
Article
Full-text available
The Internet of Things (IoT) envisions a smart environment powered by connectivity and heterogeneity where ensuring reliable services and communications across multiple industries, from financial fields to healthcare and fault detection systems, is a top priority. In such fields, data is being collected and broadcast at high speed on a continuous a...
Article
Full-text available
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of co...
Chapter
This study applies a data-driven anomaly detection framework based on a Long Short-Term Memory (LSTM) autoencoder network for several subsystems of a public transport bus. The proposed framework efficiently detects abnormal data, significantly reducing the false alarm rate compared to available alternatives. Using historical repair records, we demo...
Chapter
Systems based on Artificial Intelligence, namely Data-driven decision systems have been used in the private sector in areas such as retail, finance, and telecommunications. More recently, data-driven decision systems started to be applied in different areas of public interest, such as health, urban planning, education, criminal justice, and public...
Chapter
This article presents our recent work on the topic of learning from data streams. We focus on emerging topics, including fraud detection, learning from rare cases, and hyper-parameter tuning for streaming data.
Article
Full-text available
The increased development of urban areas results in a larger number of vehicles on the road network, leading to traffic congestion, which often leads to potentially dangerous situations that can be described as anomalies. The tensor-based methods emerged only recently in applications related to traffic anomaly detection. They outperform other model...
Article
Full-text available
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the T...
Article
Full-text available
In recent years data stream mining and learning from imbalanced data have been active research areas. Even though solutions exist to tackle these two problems, most of them are not designed to handle challenges inherited from both problems. As far as we are aware, the few approaches in the area of learning from imbalanced data streams fall in the c...
Preprint
Full-text available
Sharing of telecommunication network data, for example, even at high aggregation levels, is nowadays highly restricted due to privacy legislation and regulations and other important ethical concerns. It leads to scattering data across institutions, regions, and states, inhibiting the usage of AI methods that could otherwise take advantage of data a...
Preprint
Full-text available
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical...
Article
Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms te...
Conference Paper
Full-text available
Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribu...
Chapter
Numerous machine learning applications involve dealing with imbalanced domains, where the learning focus is on the least frequent classes. This imbalance introduces new challenges for both the performance assessment of these models and their predictive modeling. While several performance metrics have been established as baselines in balanced domain...
Chapter
We present an online optimization method for time-evolving data streams that can automatically adapt the hyper-parameters of an embedding model. More specifically, we employ the Nelder-Mead algorithm, which uses a set of heuristics to produce and exploit several potentially good configurations, from which the best one is selected and deployed. This...
Article
Full-text available
In the last few years, many works have addressed Predictive Maintenance (PdM) by the use of Machine Learning (ML) and Deep Learning (DL) solutions, especially the latter. The monitoring and logging of industrial equipment events, like temporal behavior and fault events-anomaly detection in time-series-can be obtained from records generated by senso...
Chapter
One of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal discovery algorithms have appeared that do not fall into this category. In this paper, we present a new algo...
Chapter
This work aims to develop a Machine Learning framework to predict voting behaviour. Data resulted from longitudinally collected variables during the Portuguese 2019 general election campaign. Naïve Bayes (NB), and Tree Augmented Naïve Bayes (TAN) and three different expert models using Dynamic Bayesian Networks (DBN) predict voting behaviour system...
Chapter
Full-text available
Topic modeling or inference has been one of the well-known problems in the area of text mining. It deals with the automatic categorisation of words or documents into similarity groups also known as topics. In most of the social media platforms such as Twitter, Instagram, and Facebook, hashtags are used to define the content of posts. Therefore, mod...
Article
Full-text available
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of exp...
Article
Full-text available
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principle...
Article
Full-text available
Densification events in time-evolving networks refer to instants in which the network density, that is, the number of edges, is substantially larger than in the remaining. These events can occur at a global level, involving the majority of the nodes in the network, or at a local level involving only a subset of nodes.While global densification even...
Article
Full-text available
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexit...
Article
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. The online processing of these data streams therefore requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their ba...
Book
Há algum tempo, a área de inteligência artificial deixou de ser vista apenas como teórica – destinada à aplicação em pequenos problemas “curiosos” – para se tornar um campo de pesquisa crescente, em busca de soluções de problemas reais da sociedade. Vencedor do Prêmio Jabuti 2012 (Categoria Tecnologia e Informática) quando foi lançado, Inteligênci...
Article
Full-text available
The significant growth of interconnected Internet‐of‐Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In p...
Preprint
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessmen...
Chapter
We provide a framework for analyzing geographical influence networks that have impacts on visit event sequences for a set of point-of-interests (POIs) in a city. Since mutually-exciting Hawkes processes can naturally model temporal event data and capture interactions between those events, previous work presented a probabilistic model based on Hawke...
Article
Probabilistic forecasting of distribution tails (i.e., quantiles below 0.05 and above 0.95) is challenging for non-parametric approaches since data for extreme events are scarce. A poor forecast of extreme quantiles can have a high impact in various power system decision-aid problems. An alternative approach more robust to data sparsity is extreme...