
Sylvio Barbon Junior- Associate Professor
- Associate Professor at University of Trieste
Sylvio Barbon Junior
- Associate Professor
- Associate Professor at University of Trieste
About
224
Publications
67,389
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,173
Citations
Introduction
Sylvio is an Associate Professor at the Department of Engineering and Architecture at the University of Trieste (UNITS), Italy. Currently, he is part of the Machine Learning Lab. Previously, he was the leader of the research group that studies machine learning in the Computer Science Department at State University of Londrina (UEL), Brazil.
Current institution
Additional affiliations
December 2021 - present
March 2008 - October 2012
August 2008 - October 2012
Publications
Publications (224)
We developed a wavelet-based approach for account classification that detects textual dissemination by bots on an Online Social Network (OSN). Its main objective is to match account patterns with humans, cyborgs or robots, improving the existing algorithms that automatically detect frauds. With a computational cost suitable for OSNs, the proposed a...
Traditional marbling meat evaluation is a tedious, repetitive, costly and time-consuming task performed by panellists. Alternatively, we have Computer Vision Systems (CVS) to mitigate these problems. However, most of CVS are restricted to specific environments, configurations or muscle types, and marbling scores are settled for a particular marblin...
Papaya grading is performed manually which may lead to misclassifications, resulting in fruit boxes with different maturity stages. The objective is to predict the ripening of the papaya fruit using digital imaging and random forests. A series of physical/chemical analyses are carried out and true maturity stage is derived from pulp firmness measur...
Social interactions take place in environments that influence people’s behaviours and perceptions. Nowadays, the users of Online Social Network (OSN) generate a massive amount of content based on social interactions. However, OSNs wide popularity and ease of access created a perfect scenario to practice malicious activities, compromising their reli...
The need to explain predictive models is well-established in modern machine learning. However, beyond model interpretability, understanding pre-processing methods is equally essential. Understanding how data modifications impact model performance improvements and potential biases and promoting a reliable pipeline is mandatory for developing robust...
With the increasing demand for time-series analysis, driven by the proliferation of IoT devices and real-time data-driven systems, detecting change points in time series has become critical for accurate short-term prediction. The variability in patterns necessitates frequent analysis to sustain high performance by acquiring the hyperparameter. The...
Detecting anomalous executions is essential in today’s dynamic and diverse business environments. It plays a pivotal role in identifying inefficiencies, ensuring compliance, and mitigating risks associated with deviations from standard procedures.
Traditional process mining techniques generally assume a linear sequence of events. However, real-worl...
Artificial Intelligence (AI) has become essential for analyzing complex data and solving highly-challenging tasks. It is being applied across numerous disciplines beyond computer science, including Food Engineering, where there is a growing demand for accurate and trustworthy predictions to meet stringent food quality standards. However, this requi...
Processing data streams is challenging due to the need for mining algorithms to adapt to real-time drifts. Ensemble strategies for concept drift detection show promise, yet gaps in flexibility and detection remain. We propose the Self-tuning Drift Ensemble (StDE) method, which dynamically adapts ensemble structure to stream changes while maintainin...
The Problem-oriented AutoML in Clustering (PoAC) framework introduces a novel, flexible approach to automating clustering tasks by addressing the shortcomings of traditional AutoML solutions. Conventional methods often rely on predefined internal Clustering Validity Indexes (CVIs) and static meta-features, limiting their adaptability and effectiven...
Dragon fruit (Selenicereus undatus) has gained popularity in the Brazilian market, thus demanding enhanced methods for quality assessment and control. Due to its unique morphological features, evaluating its quality presents a significant challenge to the industry. Image analysis, combined with Deep Learning (DL) techniques, offers a promising solu...
This chapter delves into the potential of utilising data-driven methods for soccer analysis. Particularly soccer, with its intricate player interactions and abundant data sources, serves as an ideal canvas for applying these methodologies. The core concept of the chapter revolves around establishing a data-driven pipeline in soccer and sports scien...
A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. Th...
Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity of business processes and the abundance of event data have driven the development and widespread adoption of process mining techniques. However, the size and noise of event logs pose challenges that...
Milk is one of the traditional foods in the human diet. However, subclinical mastitis infection in cows could
compromise the nutritional composition of milk as well as the consumer safety. In this study, we investigated the
possibility of implementing near infrared spectroscopy (NIRS), using a portable spectrometer, as a screening
method to detect...
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their complex interactions, it is common to use optimization techniques to find settings that lead to high predicti...
Machine learning models are routinely integrated into
process mining
pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance wit...
Concept drifts can occur due to various factors such as changes in the environment or the degradation of sensors, and if left undetected, they can lead to incorrect decision-making. Therefore, detecting drifts is essential to maintain the quality and effectiveness of these systems. Automatic detectors based on statistical information are usually us...
Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predict...
EXplainable AI (XAI) techniques can be employed to help identify points of concern in the objects analyzed when using image-based Deep Neural Networks (DNNs). There has been an increasing number of works proposing the usage of DNNs to perform Failure Analysis (FA) in various industrial applications. These DNNs support practitioners by providing an...
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for mapping complex event data information into a numerical feature space. Most papers choose existing encoding me...
Background: The search for valid information was one of the main challenges encountered during the COVID-19 pandemic, which resulted in the development of several online alternatives.
Objectives: To describe the development of a computational solution to interact with users of different levels of digital literacy on topics related to COVID-19 and t...
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with...
Non-invasive acoustic analyses of voice disorders have been at the forefront of current biomedical research. Usual strategies, essentially based on machine learning (ML) algorithms, commonly classify a subject as being either healthy or pathologically-affected. Nevertheless, the latter state is not always a result of a sole laryngeal issue, i.e., m...
The stream mining paradigm has become increasingly popular due to the vast number of algorithms and methodologies it provides to address the current challenges of Internet of Things (IoT) and modern machine learning systems. Change detection algorithms, which focus on identifying drifts in the data distribution during the operation of a machine lea...
Predictions from machine learning algorithms have often supported decision-making in industrial processes. Despite this, complex models can be challenging to interpret, sometimes shrouding the entire prediction process in an undesirable mystery. Understanding how the classifiers' recommendations are made helps human experts understand the phenomeno...
A mineração de processos, assim como outras disciplinas inspiradas na mineração de dados, usa algoritmos de aprendizado de máquina para incrementar suas funcionalidades. As soluções em mineração de processos aproveitam o potencial do aprendizado de máquina para reduzir dificuldades inerentes a tarefas complexas. Como resultado, a proposição de solu...
In the context of online banking, new users have to register their information to become clients through mobile applications; this process is called digital onboarding. Fraudsters often commit identity fraud by impersonating other people to obtain access to banking services by using personal data obtained illegally and causing damage to the organis...
Supervised data stream learning depends on the incoming sample's true label to update a classifier's model. In real life, obtaining the ground truth for each instance is a challenging process; it is highly costly and time consuming. Active Learning has already bridged this gap by finding a reduced set of instances to support the creation of a relia...
Dataset made from logs of event from Digital Onboarding of a Fintech
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods...
Recording anomalous traces in business processes diminishes an event log?s quality. The abnormalities may represent bad execution, security issues, or deviant behavior. Focusing on mitigating this phenomenon, organizations spend efforts to detect anomalous traces in their business processes to save resources and improve process execution. However,...
A wide range of Machine Learning algorithms can model time series to address classification, forecasting, and clustering problems. However, time series may exhibit characteristics that complicate these tasks, such as repeating patterns and seasonal variations. Time series segmentation could be a solution to these problems, but current approaches ne...
Trace clustering has been extensively used to discover aspects of the data from event logs. Process Mining techniques guide the identification of sub-logs by grouping traces with similar behaviors, producing more understandable models, and improving conformance indicators. Nevertheless, little attention has been posed to the relationship among even...
Analyzing event logs generated during the execution of digital processes, organizations can monitor the behavior of dysfunctional or unspecified processes. For achieving the most refined results, high-quality and up-to-date process models are required. However, the selection of the proper process discovery algorithm is often addressed by human expe...
Similarity searches retrieve elements in a dataset with similar characteristics to the input query element. Recent works show that graph-based methods have outperformed others in the literature, such as tree-based and hash-based methods. However, graphs are highly parameter-sensitive for indexing and searching, which usually demands extra time for...
Cocoa hybridisation generates new varieties which are resistant to several plant diseases, but has individual chemical characteristics that affect chocolate production. Image analysis is a useful method for visual discrimination of cocoa beans, while deep learning (DL) has emerged as the de facto technique for image processing. However, these algor...
Fermented cocoa bean samples collected from three different regions of Brazil (Bahia, Espírito Santo and Pará). Samples (n = 1800) of cocoa beans were cut lengthwise in two parts to expose the cotyledon. Each individual sample was analysed and classified regarding its colour and texture of the exposed surface, as in the cut-test. Cocoa beans were c...
Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the stream may continuously change due to concept drift. Therefore, algorithms must constantly adapt to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named...
Recent advances in Computer Vision and Machine Learning empowered the use of image and positional data in several high-level analyses in Sports Science, such as player action classification, recognition of complex human movements, and tactical analysis of team sports. In the context of sports action analysis, the use of positional data allows new d...
Software Defined Networking (SDN) simplifies network management and significantly reduces operational costs. SDN removes the control plane from forwarding devices (e.g., routers and switches) and centralizes this plane in a controller, enabling the management of the network forwarding decisions by programming the control plane with a high-level lan...
Choosing the most suitable algorithm to perform a machine learning task for a new problem is a recurrent and complex task. In multi-target regression tasks, when problem transformation methods are applied, this choice is even harder. The reason is the need to simultaneously choose the problem transformation method and the base learning algorithm. T...
A wide range of applications based on sequential data, named time series, have become increasingly popular in recent years, mainly those based on the Internet of Things (IoT). Several different machine learning algorithms exploit the patterns extracted from sequential data to support multiple tasks. However, this data can suffer from unreliable rea...
With data collected by Internet of Things sensors, deep learning (DL) models can forecast the generation capacity of photovoltaic (PV) power plants. This functionality is especially relevant for PV power operators and users as PV plants exhibit irregular behavior related to environmental conditions. However, DL models are vulnerable to adversarial...
A significant part of Natural Language Processing (NLP) techniques for sentiment analysis is based on supervised methods, which are affected by the quality of data. Therefore, sentiment analysis needs to be prepared for data quality issues, such as imbalance and lack of labeled data. Data augmentation methods, widely adopted in image classification...
Dominant regions are defined as the regions of the pitch where a player can reach before any other and are commonly determined neglecting the free-spaces in the pitch. We presented a new approach to football players’ dominant regions analysis, based on movement models created from players’ positions, displacement, velocity, and acceleration vectors...
Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work,...
This is an image dataset (JPEG)of papaya fruit. The objective is to predict the ripening of the papaya fruit using digital imaging. A total of 131 samples from 57 fruits are used for the experiments. These fruits are classified into three stages of maturity (EM1, EM2 and EM3). Some images present more than one acquisition, having more than one imag...
Computational intelligence Decision tree Machine learning Meta-learning Pork quality Pork quality classification is supported by different reference standards that are widely reported in the literature. However, selecting the most suitable standard for each type of meat samples remains a challenge, due to their intrinsic variation according to the...
Highly-heterogeneous and fast-arriving large amounts of
data, otherwise said Big Data, induced the development of novel Data
Management technologies. In this paper, the members of the IFIP Working
Group 2.6 share their expertise in some of these technologies, focusing
on: recent advancements in data integration, metadata management,
data quality, g...
Anomalous traces diminish the event log’s quality due to bad execution or security issues, for instance. Focusing on mitigating this phenomenon, organizations spend efforts to detect anomalous traces in their business processes to save resources and improve process execution. Conformance checking techniques are usually employed in these situations....
Identifying anomalies in business processes is a challenge organizations face daily and are critical for their operations’ data flow, whether public or private. Most current techniques face this challenge by requiring prior knowledge about business process models or specialist interventions to support the usage of state of the art methods, such as...
Process discovery methods have obtained remarkable achievements in Process Mining, delivering comprehensible process models to enhance management capabilities. However, selecting the suitable method for a specific event log highly relies on human expertise, hindering its broad application. Solutions based on Meta-learning (MtL) have been promising...
Energy dispersive X-ray fluorescence (EDXRF) is one of the most quick, environmentally friendly and least expensive spectroscopic analytical methodologies for assessing soil quality parameters. However, challenges in EDXRF spectral data analysis still demand more efficient methods. One possible solution is using Machine Learning (ML), particularly...
Blockchain food traceability is currently among the trending topics of the global agrifood segment. The possibility of rapidly assessing the full information of a food product, from its origin to the final consumer destination, covering every single point of its journey through the food chain, gives blockchain a promising disruptive tool that offer...
TC 2: Software: Theory and Practice
Fermentation of cocoa beans is a critical step for chocolate manufacturing, since fermentation influences the development of flavour, affecting components such as free amino acids, peptides and sugars. The degree of fermentation is determined by visual inspection of changes in the internal colour and texture of beans, through the cut-test. Although...
Encoding methods affect the performance of process mining tasks but little work in the literature focused on quantifying their impact. In this paper, we compare 10 different encoding methods from three different families (trace replay and alignment, graph embeddings, and word embeddings) using measures to evaluate the overlaps in the feature space,...
Data augmentation is a widely adopted method for improving model performance in image classification tasks. Although it still not as ubiquitous in Natural Language Processing (NLP) community, some methods have already been proposed to increase the amount of training data using simple text transformations or text generation through language models....
As technology evolves and electronic devices become widespread, the amount of data produced in the form of stream increases in enormous proportions. Data streams are an online source of data, meaning that it keeps producing data continuously. This creates the need for fast and reliable methods to analyse and extract information from these sources....
Forecasting photovoltaic (PV) power generation, as in many other time series scenarios, is a challenging task. Most current solutions for time series forecasting are grounded on Machine Learning (ML) algorithms , which usually outperform statistical-based methods. However, solutions based on ML and, more recently, Deep Learning (DL) have been found...
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Spectral methods usually produce a large amount of data, and have been greatly applied to food and agricultural products. These products demand several analyses to determine different parameters, that will further indicate their quality. There have been several approaches reported to deal with multi-target regression in recent years, with different...
A great concern for organizations is to detect anomalous process instances within their business processes. For that, conformance checking performs model-aware analysis by comparing process logs to business models for the detection of anomalous process executions. However, in several scenarios, a model is either unavailable or its generation is cos...
Due to the high production of complex data, the last decades have provided a huge advance in the development of similarity search methods. Recently graph-based methods have outperformed other ones in the literature of approximate similarity search. However, a graph employed on a dataset may present different behaviors depending on its parameters. T...
Machine Learning (ML) algorithms have been successfully employed by a vast range of practitioners with different backgrounds. One of the reasons for ML popularity is the capability to consistently delivers accurate results, which can be further boosted by adjusting hyperparameters (HP). However, part of practitioners has limited knowledge about the...
Currently, there has been a significant increase in the diffusion of fake news worldwide, especially the political class, where the possible misinformation that can be propagated, appearing at the elections debates around the world. However, news with a recreational purpose, such as satirical news, is often confused with objective fake news. In thi...
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their ana...
The environment's composition can have an impact on human behavior; however, this relationship remains uncertain until the cities' qualities and landscape can be analyzed empirically. Images obtained through Google Street View (GSV) enable a large volume of data for automated assessment of environmental characteristics. Deep learning techniques hav...
A composição do ambiente pode exercer impactos sobre comportamentos, no entanto, esta relação permanece incerta até que qualidades e a paisagem das cidades possam ser analisadas empiricamente. Imagens obtidas através do Google Street View (GSV) possibilitam um grande volume de dados para avaliação automatizada das características ambientais. Técnic...
Online process mining refers to a class of techniques for analyzing in real-time event streams generated by the execution of business processes. These techniques are crucial in the reactive monitoring of business processes, timely resource allocation and detection/prevention of dysfunctional behavior. Many interesting advances have been made by the...
This chapter addresses the application of machine learning algorithms to detect attacks against smart grids. Smart grids are the result of a long process of transformation that power systems have been through, relying on Information and Communication Technology (ICT) to improve their monitoring and control. Although an objective of this convergence...
Software-defined Networking (SDN) has been discovered as an architecture that uses applications to make networks flexible and centrally controlled. Although SDN provides innovative management, it still susceptible to attacks daily. Traditional detection approaches may not be sufficient to contain these threats. In this paper, we present an Artifici...
Online Social Media (OSM) have been substantially transforming the process of spreading news, improving its speed, and reducing barriers toward reaching out to a broad audience. However, OSM are very limited in providing mechanisms to check the credibility of news propagated through their structure. The majority of studies on automatic fake news de...
Edge computing (EC) is a promising technology capable of bridging the gap between Cloud computing services and the demands of emerging technologies such as the Internet of Things (IoT). Most EC-based solutions, from wearable devices to smart cities architectures, benefit from Machine Learning (ML) methods to perform various tasks, such as classific...
In this study, we developed a robust automatic computer vision system for marbling meat segmentation. Our approach can segment intramuscular fat from meat samples using images acquired with different quality devices in an illumination varying environment, where there was external ambient light and artificial light; thus, professionals can apply thi...
Several applications of supervised learning involve the prediction of multiple continuous target variables from a dataset. When the target variables exhibit statistical dependencies among them, a multi-target regression (MTR) modelling permits to improve the predictive performance in comparison to induce a separate model for each target. Apart from...
Machine Learning (ML) algorithms have been used for assessing soil quality parameters along with non-destructive methodologies. Among spectroscopic analytical methodologies, energy dispersive X-ray fluorescence (EDXRF) is one of the more quick, environmentally friendly and less expensive when compared to conventional methods. However, some challeng...
Veins in pork thigh carcass are directly related to the quality of dry-cured ham, and consequently to its market value. Some veining defects over the surface of raw ham are easily detected by humans and precisely assessed by a specialist. However, the automatic evaluation of raw ham quality by image analysis poses some challenges to the traditional...
Background
Voice disorders are related to both modest and severe health problems, including discomfort, pain, difficulty speaking, dysphagia and also cancer. Widely adopted worldwide, the combined invasive and subjective diagnosis of voice disorders is troublesome and error-prone. Contrarily, acoustic-based digital assessment allows for a non-intru...
Image segmentation is a key issue in image processing. New image segmentation algorithms have been proposed in the last years. However, there is no optimal algorithm for every image processing task. The selection of the most suitable algorithm usually occurs by testing every possible algorithm or using knowledge from previous problems. These proces...
Several multi-target regression methods were developed in the last years aiming at improving predictive performance by exploring inter-target correlation within the problem. However , none of these methods outperforms the others for all problems. This motivates the development of automatic approaches to recommend the most suitable multi-target regr...
Questions
Questions (4)
I'm looking for these kind of databases to apply some data mining approaches! Thanks.
Recently I read about a project of classification of clinical depression based on glottal features. In this paper a diagnosticated voice database was mentioned . Where I can find this kind of database?
Mahout is a solution of Apache Foundation to build scalable machine learning libraries.
Regarding accuracy and performance.