Paolo Ceravolo

Paolo Ceravolo
University of Milan | UNIMI · Department of Computer Science

Ph.D.

About

202
Publications
49,288
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,856
Citations

Publications

Publications (202)
Preprint
Full-text available
In this paper, we propose an innovative approach to thoroughly explore dataset features that introduce bias in downstream machine-learning tasks. Depending on the data format, we use different techniques to map instances into a similarity feature space. Our method's ability to adjust the resolution of pairwise similarity provides clear insights int...
Article
Full-text available
This paper investigates the effectiveness of GPT-4o-2024-08-06, the latest Large Language Model (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business e...
Article
Full-text available
Predictive Process Monitoring (PPM) extends classical process mining techniques by providing predictive models that can be applied at runtime during the execution of a business process, for example, to predict the sequence of the next event(s) in a case, its outcomes, or performance-related aspects such as the remaining processing time. These predi...
Chapter
Full-text available
Achieving a comprehensive view of a patient’s health using data from Electronic Health Record systems requires the use of advanced analytics. However, effectively managing and curating this data requires carefully designed workflows. While digitization and standardization enable continuous health monitoring, issues such as missing data values and t...
Preprint
Full-text available
A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency and coherence, making them valuable for a wide range of data-related tasks fashioned as pipelines. Th...
Article
Full-text available
To gain a comprehensive understanding of a patient’s health, advanced analytics must be applied to the data collected by electronic health record (EHR) systems. However, managing and curating this data requires carefully designed workflows. While digitalization and standardization enable continuous health monitoring, missing data values and technic...
Article
Full-text available
Citation: Grigore, I.M.; Tavares, G.M.; Silva, M.C.d.; Ceravolo, P.; Barbon Junior, S. Automated Trace Clustering Pipeline Synthesis in Process Mining. Information 2024, 15, 241. https:// Abstract: Business processes have undergone a significant transformation with the advent of the process-oriented view in organizations. The increasing complexity...
Article
Full-text available
As companies store, process, and analyse bigger and bigger volumes of highly heterogeneous data, novel research and technological challenges are emerging. Traditional and rigid data integration and processing techniques become inadequate for a new class of data-intensive applications. There is a need for new architectural, software, and hardware so...
Article
Full-text available
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance wit...
Chapter
Full-text available
Machine Learning is a powerful tool for uncovering relationships and patterns within datasets. However, applying it to a large datasets can lead to biased outcomes and quality issues, due to confounder variables indirectly related to the outcome of interest. Achieving fairness often alters training data, like balancing imbalanced groups (privileged...
Article
Full-text available
The heterogeneous data acquired by educational institutes about students’ careers (e.g., performance scores, course preferences, attendance record, demographics, etc.) has been a source of investigation for Educational Data Mining (EDM) researchers for over two decades. EDM researchers have primarily focused on course-specific data analyses of stud...
Article
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for mapping complex event data information into a numerical feature space. Most papers choose existing encoding me...
Preprint
Full-text available
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with...
Preprint
Full-text available
Process simulation is an analysis tool in process mining that allows users to measure the impact of changes, prevent losses, and update the process without risks or costs. In the literature, several process simulation techniques are available and they are usually built upon process models discovered from a given event log or learned via deep learni...
Article
Full-text available
In the context of online banking, new users have to register their information to become clients through mobile applications; this process is called digital onboarding. Fraudsters often commit identity fraud by impersonating other people to obtain access to banking services by using personal data obtained illegally and causing damage to the organis...
Preprint
Full-text available
Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods...
Conference Paper
Trace clustering has been extensively used to discover aspects of the data from event logs. Process Mining techniques guide the identification of sub-logs by grouping traces with similar behaviors, producing more understandable models, and improving conformance indicators. Nevertheless, little attention has been posed to the relationship among even...
Conference Paper
Full-text available
In this paper, we present a knowledge-based approach for legal document retrieval based on the organization of a textual data repository and on document embedding models. Pre-processed and embedded documents are iteratively classified at sentence level through a terminology extraction and concept formation cycle, using a zero-knowledge approach tha...
Article
Multimodal Analytics in Big Data architectures implies compounded configurations of the data processing tasks. Each modality in data requires specific analytics that triggers specific data processing tasks. Scalability can be reached at the cost of an attentive calibration of the resources shared by the different tasks searching for a trade-off wit...
Article
Full-text available
Continuous monitoring of the well-being state of elderly people is about to become an urgent need in the early future due to population aging. Aiming a unified notion of well-being, we find the Intrinsic Capacity concept in accordance with the SMART BEAR project goals. In this study, we mainly focus on the enabling infrastructure, mapping our model...
Chapter
Data has become fundamental to every business process and research area like never before. To date, one of the main open points of research activities is to manage the data acquired in the field by sensors, logs etc. by modeling the data structures according to the analyzes that will be carried out. In fact, with the advent of Big Data, the need to...
Article
Full-text available
Students’ engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students’ varied (dis...
Article
Background: Over recent years, interest in the development of smart health technologies aimed at supporting independent living for older populations has increased. The integration of innovative technologies, such as the Internet of Things, wearable technologies, artificial intelligence, and ambient-assisted living applications, represents a valuabl...
Article
Full-text available
Criminal investigation adopts Artificial Intelligence to enhance the volume of the facts that can be investigated and documented in trials. However, the abstract reasoning implied in legal justification and argumentation requests to adopt solutions providing high precision, low generalization error, and retrospective transparency. Three requirement...
Article
Traditional Process Mining offers batch analysis of business processes but does not transpose smoothly into online environments due to specific design constraints. Techniques adapted to support online analysis require peculiar adjustments that inherently restrict their focus to a single task. In this work, we extend the Concept Drift in Event Strea...
Conference Paper
Continuous monitoring of the wellbeing state of elderly people is about to become an urgent need in the early future due to population aging. Aiming a unified notion of well-being, we find the Intrinsic Capacity concept in accordance with the SMART BEAR project goals. In this study we mainly focus on the enabling infrastructure, mapping our models...
Preprint
Full-text available
Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work,...
Chapter
Highly-heterogeneous and fast-arriving large amounts of data, otherwise said Big Data, induced the development of novel Data Management technologies. In this paper, the members of the IFIP Working Group 2.6 share their expertise in some of these technologies, focusing on: recent advancements in data integration, metadata management, data quality, g...
Conference Paper
Identifying anomalies in business processes is a challenge organizations face daily and are critical for their operations’ data flow, whether public or private. Most current techniques face this challenge by requiring prior knowledge about business process models or specialist interventions to support the usage of state of the art methods, such as...
Article
Full-text available
Background: Over recent years, interest in the development of smart health technologies aimed at supporting independent living for older populations has increased. The integration of innovative technologies, such as the Internet of Things, wearable technologies, artificial intelligence, and ambient-assisted living applications, represents a valuab...
Preprint
Full-text available
Process discovery methods have obtained remarkable achievements in Process Mining, delivering comprehensible process models to enhance management capabilities. However, selecting the suitable method for a specific event log highly relies on human expertise, hindering its broad application. Solutions based on Meta-learning (MtL) have been promising...
Article
Full-text available
Online Social Network (OSN) is considered a key source of information for real-time decision making. However, several constraints lead to decreasing the amount of information that a researcher can have while increasing the time of social network mining procedures. In this context, this paper proposes a new framework for sampling Online Social Netwo...
Article
Due to the current emergency situation, caused by COVID-19, the scientific literature on the topic has rapidly grown. At the same time, purposeful and targeted research plans with strong background knowledge is urgently needed. However, the huge number of documents produced by multiple communities generates a fragmented terminology that may cause c...
Conference Paper
Knowledge graphs are exploited in criminal investigation to integrate heterogeneous data sources and scale up the operational efficiency of enquiry protocols. Using a declarative perspective, protocols can be viewed as a set of data ingestion procedures and nested exact queries. This meets the probating nature of procedural justice that has to proc...
Article
Full-text available
As now well established, the world population is aging rapidly and, according to World Health Organization (WHO), the amount of people aged 60 years and older is expected to total 2 billion in 2050. For this reason, an emerging important issue is the definition of a new generation of healthcare platforms capable of monitoring people's quality of li...
Conference Paper
Full-text available
Encoding methods affect the performance of process mining tasks but little work in the literature focused on quantifying their impact. In this paper, we compare 10 different encoding methods from three different families (trace replay and alignment, graph embeddings, and word embeddings) using measures to evaluate the overlaps in the feature space,...
Article
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Conference Paper
Full-text available
Real-time response is crucial in many business process scenarios, however, few tools support the online processing of Process Mining tasks. In this paper, we present Concept Drift in Event Stream Framework (CDESF), a tool focused on concept drift detection that also supports several online Process Mining tasks. CDESF highlights the process model ev...
Conference Paper
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying o...
Conference Paper
Full-text available
Research on database and information technologies has been rapidly evolving over the last couple of years. This evolution was lead by three major forces: Big Data, AI and Connected World that open the door to innovative research directions and challenges, yet exploiting four main areas: (i) computational and storage resource modeling and organizati...
Article
Full-text available
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their ana...
Article
Online process mining refers to a class of techniques for analyzing in real-time event streams generated by the execution of business processes. These techniques are crucial in the reactive monitoring of business processes, timely resource allocation and detection/prevention of dysfunctional behavior. Many interesting advances have been made by the...
Chapter
This article aims at introducing a new process-centric, trusted, configurable and multipurpose electronic voting service based on the blockchain infrastructure. The objective is to design an e-voting service using blockchain able to automatically translate service configuration defined by the end-user into a cloud-based deployable bundle, automatin...
Presentation
Full-text available
Ph.D. Thesis presented at the Computer Science Department, University of Milan, Italy
Book
Full-text available
This book constitutes revised selected papers from the 8th and 9th IFIP WG 2.6 International Symposium on Data-Driven Process Discovery and Analysis, SIMPDA 2018, held in Seville, Spain, on December 13–14, 2018, and SIMPDA 2019, held in Bled, Slovenia, on September 8, 2019. From 16 submissions received for SIMPDA 2018 and 9 submissions received fo...
Article
Full-text available
Business Processes facilitate the execution of a set of activities to achieve the strategic plans of a company. During the execution of a business process model, several decisions can be made that frequently involve the values of the input data of certain activities. The decision regarding the value of these input data concerns not only the correct...
Conference Paper
Full-text available
Data transformation and schema conciliation are relevant topics in Industry due to the incorporation of data-intensive business processes in organizations. As the amount of data sources increases, the complexity of such data increases as well, leading to complex and nested data schemata. Nowadays, novel approaches are being employed in academia and...
Chapter
Social Networks represents an invaluable source of information to detect, understand and predict social trends and complex dynamics. Unfortunately, the presence of several constraints in data collections as costs, dimensions, time and so forth, requires the implementation of a sampling strategy able to maximize the information value for the analysi...
Chapter
Today’s design of e-services for tourists means dealing with a big quantity of information and metadata that designers should be able to leverage to generate perceived values for users. In this paper we revise the design choices followed to implement a recommender system, highlighting the data processing and architectural point of view, and finally...
Article
Full-text available
Business process logs are composed of event records generated, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities. Their analysis is often delegated to auditors who have a mandate for monitoring processes and computing metrics but do not always have the rights to access the individual...
Chapter
Full-text available
Students’ confidence about their knowledge may yield high or low discrepancy in contrast to actual performance. Therefore, investigating students’ behavior towards corrective feedback (received after answering a question) becomes of particular interest. We conducted three experimental sessions with 94 undergraduate students using a computer-based a...
Chapter
Designing e-services for tourist today implies to deal with a large amount of data and metadata that developers should be able to exploit for generating user perceived values. By integrating a Recommender System on a Big Data platform, we constructed the horizontal infrastructure for managing these services in an application-neutral layer. In this...
Conference Paper
Process mining uses business event logs to understand the flow of activities, to identify anomalous cases and to enhance processes. Today, real-time process mining tools mainly deal with a single task at a time (process discovery, conformance checking, process enhancement or concept change detection). In this paper, we introduce an underlined layer...
Chapter
The increasing volume of data created and exchanged in distributed architectures has made databases a critical asset to ensure availability and reliability of business operations. For this reason, a new family of databases, called NoSQL, has been proposed. To better understand the impact this evolution can have on organizations it is useful to focu...
Conference Paper
The growing availability of data over the last decades has given rise to a number of successful technologies, ranging from data collection and storage infrastructures to hardware and software tools for efficient computation of analytics. This context, in principle, places a great demand on data quality. As a matter of fact, experience has shown tha...
Conference Paper
This article aims at introducing a new configurable and multipurpose electronic voting service based on the blockchain infrastructure. The objective is to design an architecture able to automatically translate service configuration defined by the user into a cloud-based deployable bundle, automating business logic definition, blockchain configurati...
Conference Paper
Organisations have seen a rise in the volume of data corresponding to business processes being recorded. Handling process data is a meaningful way to extract relevant information from business processes with impact on the company's values. Nonetheless, business processes are subject to changes during their executions, adding complexity to their ana...