Paolo CeravoloUniversity of Milan | UNIMI · Department of Computer Science
Paolo Ceravolo
Ph.D.
About
202
Publications
49,288
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,856
Citations
Introduction
Publications
Publications (202)
In this paper, we propose an innovative approach to thoroughly explore dataset features that introduce bias in downstream machine-learning tasks. Depending on the data format, we use different techniques to map instances into a similarity feature space. Our method's ability to adjust the resolution of pairwise similarity provides clear insights int...
This paper investigates the effectiveness of GPT-4o-2024-08-06, the latest Large Language Model (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business e...
Predictive Process Monitoring (PPM) extends classical process mining techniques by providing predictive models that can be applied at runtime during the execution of a business process, for example, to predict the sequence of the next event(s) in a case, its outcomes, or performance-related aspects such as the remaining processing time. These predi...
Achieving a comprehensive view of a patient’s health using data from Electronic Health Record systems requires the use of advanced analytics. However, effectively managing and curating this data requires carefully designed workflows. While digitization and standardization enable continuous health monitoring, issues such as missing data values and t...
A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. Th...
To gain a comprehensive understanding of a patient’s health, advanced analytics must be applied to the data collected by electronic health record (EHR) systems. However, managing and curating this data requires carefully designed workflows. While digitalization and standardization enable continuous health monitoring, missing data values and technic...
Citation: Grigore, I.M.; Tavares, G.M.; Silva, M.C.d.; Ceravolo, P.; Barbon Junior, S. Automated Trace Clustering Pipeline Synthesis in Process Mining. Information 2024, 15, 241. https:// Abstract: Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity...
As companies store, process, and analyse bigger and bigger volumes of highly heterogeneous data, novel research and technological challenges are emerging. Traditional and rigid data integration and processing techniques become inadequate for a new class of data-intensive applications. There is a need for new architectural, software, and hardware so...
Machine learning models are routinely integrated into
process mining
pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance wit...
Machine Learning is a powerful tool for uncovering relationships and patterns within datasets. However, applying it to a large datasets can lead to biased outcomes and quality issues, due to confounder variables indirectly related to the outcome of interest. Achieving fairness often alters training data, like balancing imbalanced groups (privileged...
The heterogeneous data acquired by educational institutes about students’ careers (e.g., performance scores, course preferences, attendance record, demographics, etc.) has been a source of investigation for Educational Data Mining (EDM) researchers for over two decades. EDM researchers have primarily focused on course-specific data analyses of stud...
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for mapping complex event data information into a numerical feature space. Most papers choose existing encoding me...
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with...
Process simulation is an analysis tool in process mining that allows users to measure the impact of changes, prevent losses, and update the process without risks or costs. In the literature, several process simulation techniques are available and they are usually built upon process models discovered from a given event log or learned via deep learni...
In the context of online banking, new users have to register their information to become clients through mobile applications; this process is called digital onboarding. Fraudsters often commit identity fraud by impersonating other people to obtain access to banking services by using personal data obtained illegally and causing damage to the organis...
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods...
Trace clustering has been extensively used to discover aspects of the data from event logs. Process Mining techniques guide the identification of sub-logs by grouping traces with similar behaviors, producing more understandable models, and improving conformance indicators. Nevertheless, little attention has been posed to the relationship among even...
In this paper, we present a knowledge-based approach for legal document retrieval based on the organization of a textual data repository and on document embedding models. Pre-processed and embedded documents are iteratively classified at sentence level through a terminology extraction and concept formation cycle, using a zero-knowledge approach tha...
Multimodal Analytics in Big Data architectures implies compounded configurations of the data processing tasks. Each modality in data requires specific analytics that triggers specific data processing tasks. Scalability can be reached at the cost of an attentive calibration of the resources shared by the different tasks searching for a trade-off wit...
Continuous monitoring of the well-being state of elderly people is about to become an urgent need in the early future due to population aging. Aiming a unified notion of well-being, we find the Intrinsic Capacity concept in accordance with the SMART BEAR project goals. In this study, we mainly focus on the enabling infrastructure, mapping our model...
Data has become fundamental to every business process and research area like never before. To date, one of the main open points of research activities is to manage the data acquired in the field by sensors, logs etc. by modeling the data structures according to the analyzes that will be carried out. In fact, with the advent of Big Data, the need to...
Students’ engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students’ varied (dis...
Background: Over recent years, interest in the development of smart health technologies aimed at supporting independent living for older populations has increased. The integration of innovative technologies, such as the Internet of Things, wearable technologies, artificial intelligence, and ambient-assisted living applications, represents a valuabl...
Criminal investigation adopts Artificial Intelligence to enhance the volume of the facts that can be investigated and documented in trials. However, the abstract reasoning implied in legal justification and argumentation requests to adopt solutions providing high precision, low generalization error, and retrospective transparency. Three requirement...
Traditional Process Mining offers batch analysis of business processes but does not transpose smoothly into online environments due to specific design constraints. Techniques adapted to support online analysis require peculiar adjustments that inherently restrict their focus to a single task. In this work, we extend the Concept Drift in Event Strea...
Continuous monitoring of the wellbeing state of elderly people is about to become an urgent need in the early future due to population aging. Aiming a unified notion of well-being, we find the Intrinsic Capacity concept in accordance with the SMART BEAR project goals. In this study we mainly focus on the enabling infrastructure, mapping our models...
Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work,...
Highly-heterogeneous and fast-arriving large amounts of
data, otherwise said Big Data, induced the development of novel Data
Management technologies. In this paper, the members of the IFIP Working
Group 2.6 share their expertise in some of these technologies, focusing
on: recent advancements in data integration, metadata management,
data quality, g...
Identifying anomalies in business processes is a challenge organizations face daily and are critical for their operations’ data flow, whether public or private. Most current techniques face this challenge by requiring prior knowledge about business process models or specialist interventions to support the usage of state of the art methods, such as...
Background:
Over recent years, interest in the development of smart health technologies aimed at supporting independent living for older populations has increased. The integration of innovative technologies, such as the Internet of Things, wearable technologies, artificial intelligence, and ambient-assisted living applications, represents a valuab...
Process discovery methods have obtained remarkable achievements in Process Mining, delivering comprehensible process models to enhance management capabilities. However, selecting the suitable method for a specific event log highly relies on human expertise, hindering its broad application. Solutions based on Meta-learning (MtL) have been promising...
Online Social Network (OSN) is considered a key source of information for real-time decision making. However, several constraints lead to decreasing the amount of information that a researcher can have while increasing the time of social network mining procedures. In this context, this paper proposes a new framework for sampling Online Social Netwo...
TC 2: Software: Theory and Practice
Due to the current emergency situation, caused by COVID-19, the scientific literature on the topic has rapidly grown. At the same time, purposeful and targeted research plans with strong background knowledge is urgently needed. However, the huge number of documents produced by multiple communities generates a fragmented terminology that may cause c...
Knowledge graphs are exploited in criminal investigation to integrate heterogeneous data sources and scale up the operational efficiency of enquiry protocols. Using a declarative perspective, protocols can be viewed as a set of data ingestion procedures and nested exact queries. This meets the probating nature of procedural justice that has to proc...
As now well established, the world population is aging rapidly and, according to World Health Organization (WHO), the amount of people aged 60 years and older is expected to total 2 billion in 2050. For this reason, an emerging important issue is the definition of a new generation of healthcare platforms capable of monitoring people's quality of li...
Encoding methods affect the performance of process mining tasks but little work in the literature focused on quantifying their impact. In this paper, we compare 10 different encoding methods from three different families (trace replay and alignment, graph embeddings, and word embeddings) using measures to evaluate the overlaps in the feature space,...
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Real-time response is crucial in many business process scenarios, however, few tools support the online processing of Process Mining tasks. In this paper, we present Concept Drift in Event Stream Framework (CDESF), a tool focused on concept drift detection that also supports several online Process Mining tasks. CDESF highlights the process model ev...
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organizati...
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their ana...
Online process mining refers to a class of techniques for analyzing in real-time event streams generated by the execution of business processes. These techniques are crucial in the reactive monitoring of business processes, timely resource allocation and detection/prevention of dysfunctional behavior. Many interesting advances have been made by the...
This article aims at introducing a new process-centric, trusted, configurable and multipurpose electronic voting service based on the blockchain infrastructure. The objective is to design an e-voting service using blockchain able to automatically translate service configuration defined by the end-user into a cloud-based deployable bundle, automatin...
Ph.D. Thesis presented at the Computer Science Department, University of Milan, Italy
This book constitutes revised selected papers from the 8th and 9th IFIP WG 2.6 International Symposium on Data-Driven Process Discovery and Analysis, SIMPDA 2018, held in Seville, Spain, on December 13–14, 2018, and SIMPDA 2019, held in Bled, Slovenia, on September 8, 2019.
From 16 submissions received for SIMPDA 2018 and 9 submissions received fo...
Business Processes facilitate the execution of a set of activities to achieve the strategic plans of a company. During the execution of a business process model, several decisions can be made that frequently involve the values of the input data of certain activities. The decision regarding the value of these input data concerns not only the correct...
Data transformation and schema conciliation are relevant topics in Industry due to the incorporation of data-intensive business processes in organizations. As the amount of data sources increases, the complexity of such data increases as well, leading to complex and nested data schemata. Nowadays, novel approaches are being employed in academia and...
Social Networks represents an invaluable source of information to detect, understand and predict social trends and complex dynamics. Unfortunately, the presence of several constraints in data collections as costs, dimensions, time and so forth, requires the implementation of a sampling strategy able to maximize the information value for the analysi...
Today’s design of e-services for tourists means dealing with a big quantity of information and metadata that designers should be able to leverage to generate perceived values for users. In this paper we revise the design choices followed to implement a recommender system, highlighting the data processing and architectural point of view, and finally...
Business process logs are composed of event records generated, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities. Their analysis is often delegated to auditors who have a mandate for monitoring processes and computing metrics but do not always have the rights to access the individual...
Students’ confidence about their knowledge may yield high or low discrepancy in contrast to actual performance. Therefore, investigating students’ behavior towards corrective feedback (received after answering a question) becomes of particular interest. We conducted three experimental sessions with 94 undergraduate students using a computer-based a...
Designing e-services for tourist today implies to deal with a large amount of data and metadata that developers should be able to exploit for generating user perceived values. By integrating a Recommender System on a Big Data platform, we constructed the horizontal infrastructure for managing these services in an application-neutral layer. In this...
Process mining uses business event logs to understand the flow of activities, to identify anomalous cases and to enhance processes. Today, real-time process mining tools mainly deal with a single task at a time (process discovery, conformance checking, process enhancement or concept change detection). In this paper, we introduce an underlined layer...
The increasing volume of data created and exchanged in distributed architectures has made databases a critical asset to ensure availability and reliability of business operations. For this reason, a new family of databases, called NoSQL, has been proposed. To better understand the impact this evolution can have on organizations it is useful to focu...
The growing availability of data over the last decades has given rise to a number of successful technologies, ranging from data collection and storage infrastructures to hardware and software tools for efficient computation of analytics. This context, in principle, places a great demand on data quality. As a matter of fact, experience has shown tha...
This article aims at introducing a new configurable and multipurpose electronic voting service based on the blockchain infrastructure. The objective is to design an architecture able to automatically translate service configuration defined by the user into a cloud-based deployable bundle, automating business logic definition, blockchain configurati...
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their ana...