About
569
Publications
316,565
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,809
Citations
Introduction
Publications
Publications (569)
To maintain the performance of the latest generation of onshore and offshore wind turbine systems, a new methodology must be proposed to enhance the maintenance policy. In this context, this paper introduces an approach to designing a decision support tool that combines predictive capabilities with anomaly explanations for effective IoT predictive...
Image segmentation for detecting illegal landfill waste in aerial images is essential for environmental crime monitoring. Despite advancements in segmentation models, the primary challenge in this domain is the lack of annotated data due to the unknown locations of illegal waste disposals. This work mainly focuses on evaluating segmentation models...
Effective anomaly detection in telecommunication networks is essential for securing digital transactions and supporting the sustainability of our global information ecosystem. However, the volume of data in such high-speed distributed environments imposes strict latency and scalability requirements on anomaly detection systems. This study focuses o...
Predictive Maintenance (PdM) emerged as one of the pillars of Industry 4.0, and became crucial for enhancing operational efficiency, allowing to minimize downtime, extend lifespan of equipment, and prevent failures. A wide range of PdM tasks can be performed using Artificial Intelligence (AI) methods, which often use data generated from industrial...
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical...
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitra...
Over the past few decades, road transportation emissions have increased. Vehicles are among the most significant sources of pollutants in urban areas. As such, several studies and public policies emerged to address the issue. Estimating greenhouse emissions and air quality over space and time is crucial for human health and mitigating climate chang...
E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering...
Trajectory clustering is one of the most important issues in mobility patterns data mining. It is applied in several cases such as hot-spots detection, urban transportation control, animal migration movements, and tourist visiting routes among others. In this paper, we describe how to identify the most frequent trajectories from raw GPS data. By ma...
The transition to Industry 4.0 provoked a transformation of industrial manufacturing with a significant leap in automation and intelligent systems. This paradigm shift has brought about a mindset that emphasizes predictive maintenance: detecting future failures when current behaviour of industrial processes and machines is thought to be normal. The...
Explainable Artificial Intelligence (XAI) fills the role of a critical interface fostering interactions between sophisticated intelligent systems and diverse individuals, including data scientists, domain experts, end-users, and more. It aids in deciphering the intricate internal mechanisms of ``black box'' Machine Learning (ML), rendering the reas...
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions ba...
The human brain works in such a complex way that we have not yet managed to decipher its functional mysteries. It has five main channels that act as information input: the senses. Sight, hearing, taste, smell, and touch generate information that flows from their corresponding receptors, i.e., the eyes, ears, tongue, nose, and skin, that help us und...
As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper prop...
In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and...
Water is a fundamental human resource and its scarcity is reflected in social, economic and environmental problems. Water used in human activities must be treated before reusing or returning to nature. This treatment takes place in wastewater treatment plants (WWTPs), which need to perform their functions with high quality, low cost, and reduced en...
Road transportation emissions have increased in the last few decades and have been the primary source of pollutants in urban areas with ever-growing populations. In this context, it is important to have effective measures to monitor road emissions in regions. Creating an emission inventory over a region that can map the road emission based on the v...
The demand for high-performance solutions for anomaly detection and forecasting fault events is increasing in the industrial area. The detection and forecasting faults from time-series data are one critical mission in the Internet of Things (IoT) data mining. The classical fault detection approaches based on physical modelling are limited to some m...
The growing use of data-driven decision systems based on Artificial Intelligence (AI) by governments, companies and social organizations has given more attention to the challenges they pose to society. Over the last few years, news about discrimination appeared on social media, and privacy, among others, highlighted their vulnerabilities. Despite a...
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black-box models are popular approaches due to their predictive accuracy and are based on deep-learning techniques. This paper presents an architecture that uses an online rule learning algorithm to explain when the black-box model predicts rare...
An online data-driven predictive maintenance approach for railway switches using data logs obtained from the interlocking system of the railway infrastructure is proposed in this paper. The proposed approach is detailed described and consists of a two-phase process: anomaly detection and remaining useful life prediction. The approach is applied to...
Federated learning (FL) is a collaborative, decentralized privacy‐preserving method to attach the challenges of storing data and data privacy. Artificial intelligence, machine learning, smart devices, and deep learning have strongly marked the last years. Two challenges arose in data science as a result. First, the regulation protected the data by...
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumpti...
More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularl...
Influence Analysis is one of the well‐known areas of Social Network Analysis. However, discovering influencers from micro‐blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for...
Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes...
An online data-driven predictive maintenance approach for railway switches using data logs obtained from the interlocking system of the railway infrastructure is proposed in this paper. The proposed approach is detailed described and consists of a two-phase process: anomaly detection and remaining useful life prediction. The approach is applied to...
Analyzing the way individuals move is fundamental to understand the dynamics of humanity. Transportation mode plays a significant role in human behavior as it changes how individuals travel, how far, and how often they can move. The identification of transportation modes can be used in many applications and it is a key component of the internet of...
In hospital and after ICU discharge deaths are usual, given the severity of the condition under which many of them are admitted to these wings. Because of this, there is an urge to identify and follow these cases closely. Furthermore, as ICU data is usually composed of variables measured in varying time intervals, there is a need for a method that...
The paper describes the Railway data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected between 2020 and 2022 that aimed to develop machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (...
With the growth of 5G networks, the Internet-of-Things is becoming a reality. The advances in networking, machine learning, data analytics, and robotics largely improve industrial processes. Industry 4.0 is a term for the fourth industrial revolution: the digitization and automation of manufacturing.1 Predictive maintenance is one of the techniques...
There has been a significant effort by the research community to address the problem of providing methods to organize documentation with the help of information Retrieval methods. In this report paper, we present several experiments with some stream analysis methods to explore streams of text documents. We use only dynamic algorithms to explore, an...
This study applies a data-driven anomaly detection framework based on a Long Short-Term Memory (LSTM) autoencoder network for several subsystems of a public transport bus. The proposed framework efficiently detects abnormal data, significantly reducing the false alarm rate compared to available alternatives. Using historical repair records, we demo...
Non-traditional data like the applicant’s bank statement is a significant source for decision-making when granting loans. We find that we can use methods from network science on the applicant’s bank statements to convert inherent cash flow characteristics to predictors for default prediction in a credit scoring or credit risk assessment model. Firs...
The Internet of Things (IoT) envisions a smart environment powered by connectivity and heterogeneity where ensuring reliable services and communications across multiple industries, from financial fields to healthcare and fault detection systems, is a top priority. In such fields, data is being collected and broadcast at high speed on a continuous a...
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of co...
This study applies a data-driven anomaly detection framework based on a Long Short-Term Memory (LSTM) autoencoder network for several subsystems of a public transport bus. The proposed framework efficiently detects abnormal data, significantly reducing the false alarm rate compared to available alternatives. Using historical repair records, we demo...
Systems based on Artificial Intelligence, namely Data-driven decision systems have been used in the private sector in areas such as retail, finance, and telecommunications. More recently, data-driven decision systems started to be applied in different areas of public interest, such as health, urban planning, education, criminal justice, and public...
This article presents our recent work on the topic of learning from data streams. We focus on emerging topics, including fraud detection, learning from rare cases, and hyper-parameter tuning for streaming data.
The increased development of urban areas results in a larger number of vehicles on the road network, leading to traffic congestion, which often leads to potentially dangerous situations that can be described as anomalies. The tensor-based methods emerged only recently in applications related to traffic anomaly detection. They outperform other model...
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the T...
In recent years data stream mining and learning from imbalanced data have been active research areas. Even though solutions exist to tackle these two problems, most of them are not designed to handle challenges inherited from both problems. As far as we are aware, the few approaches in the area of learning from imbalanced data streams fall in the c...
Sharing of telecommunication network data, for example, even at high aggregation levels, is nowadays highly restricted due to privacy legislation and regulations and other important ethical concerns. It leads to scattering data across institutions, regions, and states, inhibiting the usage of AI methods that could otherwise take advantage of data a...
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical...
Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms te...
Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribu...
Numerous machine learning applications involve dealing with imbalanced domains, where the learning focus is on the least frequent classes. This imbalance introduces new challenges for both the performance assessment of these models and their predictive modeling. While several performance metrics have been established as baselines in balanced domain...
We present an online optimization method for time-evolving data streams that can automatically adapt the hyper-parameters of an embedding model. More specifically, we employ the Nelder-Mead algorithm, which uses a set of heuristics to produce and exploit several potentially good configurations, from which the best one is selected and deployed. This...
In the last few years, many works have addressed Predictive Maintenance (PdM) by the use of Machine Learning (ML) and Deep Learning (DL) solutions, especially the latter. The monitoring and logging of industrial equipment events, like temporal behavior and fault events-anomaly detection in time-series-can be obtained from records generated by senso...
One of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal discovery algorithms have appeared that do not fall into this category. In this paper, we present a new algo...
This work aims to develop a Machine Learning framework to predict voting behaviour. Data resulted from longitudinally collected variables during the Portuguese 2019 general election campaign. Naïve Bayes (NB), and Tree Augmented Naïve Bayes (TAN) and three different expert models using Dynamic Bayesian Networks (DBN) predict voting behaviour system...
Topic modeling or inference has been one of the well-known problems in the area of text mining. It deals with the automatic categorisation of words or documents into similarity groups also known as topics. In most of the social media platforms such as Twitter, Instagram, and Facebook, hashtags are used to define the content of posts. Therefore, mod...
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers’ credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of exp...
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principle...
Densification events in time-evolving networks refer to instants in which the network density, that is, the number of edges, is substantially larger than in the remaining. These events can occur at a global level, involving the majority of the nodes in the network, or at a local level involving only a subset of nodes.While global densification even...
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexit...
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. The online processing of these data streams therefore requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their ba...
Há algum tempo, a área de inteligência artificial deixou de ser vista apenas como teórica – destinada à aplicação em pequenos problemas “curiosos” – para se tornar um campo de pesquisa crescente, em busca de soluções de problemas reais da sociedade.
Vencedor do Prêmio Jabuti 2012 (Categoria Tecnologia e Informática) quando foi lançado, Inteligênci...
The significant growth of interconnected Internet‐of‐Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In p...
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessmen...
We provide a framework for analyzing geographical influence networks that have impacts on visit event sequences for a set of point-of-interests (POIs) in a city. Since mutually-exciting Hawkes processes can naturally model temporal event data and capture interactions between those events, previous work presented a probabilistic model based on Hawke...
Probabilistic forecasting of distribution tails (i.e., quantiles below 0.05 and above 0.95) is challenging for non-parametric approaches since data for extreme events are scarce. A poor forecast of extreme quantiles can have a high impact in various power system decision-aid problems. An alternative approach more robust to data sparsity is extreme...