About
113
Publications
30,069
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,890
Citations
Citations since 2017
Introduction
Luigi Pontieri is a senior researcher at the High Performance Computing and Networks Institute (ICAR-CNR) of the National Research Council of Italy, and contract professor at the University of Calabria, Italy.
He received the Laurea Degree in Computer Engineering, in July 1996, and the Ph.D. in System Engineering and Computer Science, in April 2001, both from the University of Calabria, Italy.
His current research interests include Knowledge Discovery, Data and Process Mining, Data Compression,
Publications
Publications (113)
Detecting deviant traces in business process logs is a crucial task in modern organizations due to the detrimental effect of certain deviant behaviors (e.g., attacks, frauds, faults). Training a Deviance Detection Model (DDM) only over labeled traces with supervised learning methods unfits real-life contexts where a small fraction of the traces are...
Intelligent Ticket Management Systems, equipped with automated ticket classification tools, are an advanced solution for handling customer-support activities. Some recent approaches to ticket classification leverage Deep Learning (DL) methods, in place of traditional ones using standard Machine Learning and feature engineering techniques. However,...
Predicting the final outcome of an ongoing process instance is a key problem in many real-life contexts. This problem has been addressed mainly by discovering a prediction model by using traditional machine learning methods and, more recently, deep learning methods, exploiting the supervision coming from outcome-class labels associated with histori...
Generally, companies and organizations can greatly improve their business processes by suitably monitoring and analyzing the log data that they gather for these processes in the form of traces. We here consider the challenging scenario where there is an abstraction gap between the “low-level” events composing the traces and the “high-level” activit...
A Correction to this paper has been published: 10.1007/s13740-021-00121-2
The ever-increasing attention of process mining (PM) research to the logs of low structured processes and of non-process-aware systems (e.g., ERP, IoT systems) poses a number of challenges. Indeed, in such cases, the risk of obtaining low-quality results is rather high, and great effort is needed to carry out a PM project, most of which is usually...
Classification-oriented Machine Learning methods are a precious tool, in modern Intrusion Detection Systems (IDSs), for discriminating between suspected intrusion attacks and normal behaviors. Many recent proposals in this field leveraged Deep Neural Network (DNN) methods, capable of learning effective hierarchical data representations automaticall...
Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Cla...
Traditionally, Expert Systems have found a natural application in the behavioral analysis of processes. In fact, they have proved effective in the tasks of interpreting the data collected during the process executions and of analyzing these data with the aim of diagnosing/detecting anomalies. In this context, we focus on log data generated by execu...
Intrusion detection systems have to cope with many challenging problems, such as unbalanced datasets, fast data streams and frequent changes in the nature of the attacks (concept drift). To this aim, here, a distributed genetic programming (GP) tool is used to generate the combiner function of an ensemble; this tool does not need a heavy additional...
Mining deviances from expected behaviors in process logs is a relevant problem in modern organizations, owing to their negative impact in terms of monetary/reputation losses. Most proposals to deviance mining combine the extraction of behavioral features from log traces with the induction of standard classifiers. Difficulties in capturing the multi...
Process Mining (PM) is meant to extract knowledge on the behavior of business processes from historical log data. Lately, an increasing attention has been gained by the Predictive Process Monitoring, a field of PM that tries to extend process monitoring systems with prediction capabilities and, in particular. Several current proposals in literature...
The ever increasing attention of Process Mining (PM) research to the logs of lowly-structured processes and of non process-aware systems (e.g., ERP, IoT systems) poses several challenges stemming from the lower quality that these logs have, concerning the precision, completeness and abstraction with which they describe the activities performed. In...
"Extending Process Mining techniques with additional AI capabilities to better exploit incomplete/low-level log data: solutions, open issues and perspectives"
Modern intrusion detection systems must be able to discover new types of attacks in real-time. To this aim, automatic or semi-automatic techniques can be used; outlier detection algorithms are particularly apt to this task, as they can work in an unsupervised way. However, due to the different nature and behavior of the attacks, the performance of...
Process Discovery techniques, allowing to extract graph-like models from large process logs, are a valuable mean for grasping a summarized view of real business processes’ behaviors. If augmented with statistics on process performances (e.g., processing times), such models help study the evolution of process performances across different processing...
Current approaches to the security-oriented classification of process log traces can be split into two categories: (i) example-driven methods, that induce a classifier from annotated example traces; (ii) model-driven methods, based on checking the conformance of each test trace to security-breach models defined by experts. These categories are orth...
Business Process Intelligence (BPI) and Process Mining, two very active research areas of research, share a great interest towards the issue of discovering an effective Deviance Detection Model (DDM), computed via accessing log data. The DDM model allows us to understand whether novel instances of the target business process are deviant or not, thu...
In many application contexts, a business process' executions are subject to performance constraints expressed in an aggregated form, usually over predefined time windows, and detecting a likely violation to such a constraint in advance could help undertake corrective measures for preventing it. This paper illustrates a prediction-aware event proces...
Computer Science is a relatively young discipline, but in the last two decades the advances in hardware technology and software engineering has induced notable changes in the way users interact with computers. In particular, several processes involving data have changed in a radical manner. As a matter of fact, the amount of data stored in reposito...
The problem of classifying business log traces is addressed in the context of security risk analysis. We consider the challenging setting where the actions performed in a process instance are described in the log as executions of low-level operations (such as “Pose a query over a DB”, “Upload a file into an ftp server”), while analysts and business...
Monitoring the performances of a business process is a key issue in many organizations, especially when the process must comply with predefined performance constraints. In such a case, empowering the monitoring system with prediction capabilities would allow us to know in advance a constraint violation, and possibly trigger corrective measures to e...
Process mining methods have been proven effective in turning historical log data into actionable process knowledge. However, most of them work under the assumption that the events reported in the logs can be easily mapped to well-defined process activities, that are the terms in which analysts are used to reason on the processes’ behaviors. We here...
Increasing attention has been paid to the problem of explaining and analyzing "deviant cases" generated by a business process, i.e. instances of the process that diverged from prescribed/expected behavior (e.g. frauds, faults, SLA violations). In many real settings, such cases are labelled with a numerical deviance measure, and the analyst wants to...
Increasing attention has been paid to the detection and analysis of “deviant” instances of a business process that are connected with some kind of “hidden” undesired behavior (e.g. frauds and faults). In particular, several recent works faced the problem of inducing a binary classification model (here named deviance detection model) that can discri...
This paper presents a framework for analyzing and predicting the performances of a business process, based on historical data gathered during its past enactments. The framework hinges on an inductive-learning technique for
discovering a special kind of predictive process models, which can support the run-time prediction of a given performance measu...
Log analysis and querying recently received a renewed interest from the research community, as the effective understanding of process behavior is crucial for improving business process management. Indeed, currently available log querying tools are not completely satisfactory, especially from the viewpoint of easiness of use. As a matter of fact, th...
In the context of security risk analysis, we address the problem of classifying log traces describing business process executions. Specifically, on the basis of some (possibly incomplete) knowledge of the process structures and of the patterns representing unsecure behaviors, we classify each trace as instance of some process and/or as potential se...
Increasing attention has been paid of late to the problem of detecting and explaining “deviant” process instances, i.e. instances diverging from normal/desired outcomes (e.g., frauds, faults, SLA violations), based on log data. Current solutions allow to discriminate between deviant and normal instances, by combining the extraction of (sequence-bas...
We consider the scenario where the executions of different business processes are traced into a log, where each trace describes a process instance as a sequence of low-level events (representing basic kinds of operations). In this context, we address a novel problem: given a description of the processes’ behaviors in terms of high-level activities...
The issue of devising efficient and effective solutions for supporting the analysis of process logs has recently received great attention from the research community, as effectively accomplishing any business process management task requires understanding the behavior of the processes. In this paper, we propose a new framework supporting the analys...
Predicting the fix time (i.e. the time needed to eventually solve a case) is a key task in an issue tracking system, which attracted the attention of data-mining researchers in recent years. Traditional approaches only try to forecast the overall fix time of a case when it is reported, without updating this preliminary estimate as long as the case...
The increasing availability of large process log repositories calls for efficient solutions for their analysis. In this regard, a novel specialized compression technique for process logs is proposed, that builds a synopsis supporting a fast estimation of aggregate queries, which are of crucial importance in exploratory and high-level analysis tasks...
Process discovery techniques are a precious tool for analyzing the real behavior of a business process. However, their direct application to lowly structured logs may yield unreadable and inaccurate models. Current solutions rely on event abstraction or trace clustering, and assume that log events refer to well-defined (possibly low-level) process...
Process discovery has emerged as a powerful approach to support the analysis and the design of complex processes. It consists of analyzing a set of traces registering the sequence of tasks performed along several enactments of a transactional system, in order to build a process model that can explain all the episodes recorded over them. An approach...
Process discovery (i.e. the automated induction of a behavioral process model from execution logs) is an important tool for business process analysts/managers, who can exploit the extracted knowledge in key process improvement and (re-)design tasks. Unfortunately, when directly applied to the logs of complex and/or lowly-structured processes, such...
This paper presents a framework for analyzing and predicting the performances of a business process, based on historical data gathered during its past enactments. The framework hinges on an inductive-learning technique for discovering a special kind of predictive process models, which can support the run-time prediction of a given performance measu...
Process Mining techniques have been gaining attention, especially as concerns the discovery of predictive process models. Traditionally focused on workflows, they usually assume that process tasks are clearly specified, and referred to in the logs. This limits however their application to many real-life BPM environments (e.g. issue tracking systems...
Fix-time prediction is a key task in bug tracking systems, which was recently faced through predictive data mining approaches, trying to estimate the time needed to solve a case, at the very moment when it is reported. And yet, the actions performed on a bug, along its life, can help refine the prediction of its (remaining) fix-time, by leveraging...
Predicting run-time performances is a hot issue in ticket resolution processes. Recent efforts to take account for the sequence of resolution steps, suggest that predictive Process Mining (PM) techniques could be applied in this field, if suitably adapted to the peculiarities of ticket systems. In particular, the performances of a ticket instance u...
Modeling behavioral aspects of business processes is a hard and costly task, which usually requires heavy intervention of business experts. This explains the increasing attention given to process mining techniques, which automatically extract behavioral process models from log data. In the case of complex processes, however, the models identified b...
This paper presents a novel approach to the discovery of predictive process models, which are meant to support the run-time prediction of some performance indicator (e.g., the remaining processing time) on new ongoing processinstances. To this purpose,we combine a series of data mining techniques(ranging from pattern mining,to non-parametric regres...
Process Mining techniques have been gaining attention, owing to their potentiality to extract compact process models from massive logs. Traditionally focused on workflows, they often assume that process tasks are clearly specified, and referred to in the logs. This limits how- ever their application to many real-life BPM environments (e.g. issue tr...
A key task in process mining consists of building a graph of causal dependencies over process activities, which can then help derive more expressive models in some high-level modeling language. An approach to accomplishing this task is presented, where the learning process can exploit background knowledge available to the analyst. The method is bas...
The discovery of predictive models for process performances is an emerging topic, which poses a series of difficulties when considering complex and flexible processes, whose behaviour tend to change over time depending on context factors. We try to face such a situation by proposing a predictive-clustering approach, where different context-related...
Discovering predictive models for run-time support is an emerging topic in Process Mining research, which can effectively help optimize business process enactments. However, making accurate estimates is not easy especially when considering fine-grain performance measures (e.g., processing times) on a complex and flexible business process, where per...
A prominent goal of process mining is to build automatically a model explaining all the episodes recorded in the log of some transactional system. Whenever the process to be mined is complex and highly-flexible, however, equipping all the traces with just one model might lead to mixing different usage scenarios, thereby resulting in a spaghetti-lik...
Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events ar...
Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events ar...
The high-order coclustering problem, i.e., the problem of simultaneously clustering heterogeneous types of domain, has become an active research area in the last few years, due to the notable impact it has on several application scenarios. This problem is generally faced by optimizing a weighted combination of functions measuring the quality of coc...
The bi-clustering, i.e., simultaneously clustering two types of objects based on their correlations, has been studied actively in the last few years, in virtue of its impact on several relevant applications, such as text mining, collaborative filtering, gene expression analysis. In particular, many research efforts were recently spent on extending...
A knowledge-based framework for supporting and analyzing loosely-structured collaborative processes (LSCPs) is presented in this paper. The framework takes advantages from a number of knowledge representation, management and processing capabilities, including recent process mining techniques. In order to support the enactment, analysis and optimiza...
Process-oriented systems have been increasingly attracting data mining researchers, mainly due to the advantages that the application of inductive process mining techniques to log data could open to both the analysis of complex processes and the design of new process models. However, the actual impact of process mining in the industry is endangered...
Process mining techniques have been receiving great attention in the literature for their ability to automatically support process (re)design. Typically, these techniques discover a concrete workflow schema modelling all possible execution patterns registered in a given log, which can be exploited subsequently to support further-coming enactments....
Histograms are used to summarize the contents of relations into a number of buckets for the estimation of query result sizes. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide accurate estimations. However, while search strategies for optimal bucket boundaries are rather...
Process Mining techniques exploit the information stored in the execution log of a process to extract some high-level process model, useful for analysis or design tasks. Most of these techniques focus on "structural" aspects of the process, in that they only consider what elementary activities were executed and in which ordering. Hence, any other "...
Classical outlier detection approaches may hardly fit process mining applications, since in these settings anomalies emerge
not only as deviations from the sequence of events most often registered in the log, but also as deviations from the behavior
prescribed by some (possibly unknown) process model. These issues have been faced in the paper via a...
The “internetworked” enterprise domain poses a challenge to IT researchers, due to the complexity and dynamicity of collaboration
processes that are to be supported in such a scenario typically. A major issue in this context, where several entities are
possibly involved that cooperate according to continuously evolving schemes, is to develop suitab...
Mining process logs has been increasingly attracting the data mining community, due to the chances the development of process mining techniques can offer to the analysis and design of complex processes. Currently, these techniques focus on “structural” aspects by only considering which activities were executed and in which order, and disregard any...
Mining process logs has been increasingly attracting the data mining community, due to the chances the development of process mining techniques can offer to the analysis and design of complex processes. Currently, these techniquesfocus on "structural" aspects by only considering which activities were executed and in which order, and disregard any o...
In this paper, we propose a classification technique for Web pages, based on the detection of structural similarities among semistructured documents, and devise an architecture exploiting such technique for the purpose of information extraction. The proposal significantly differs from standard methods based on graph-matching algorithms, and is base...
We propose an incremental algorithm for discovering clusters of duplicate tuples in large databases. The core of the approach is the usage of an indexing technique which, for any newly arrived tuple mu, allows to efficiently retrieve a set of tuples in the database which are mostly similar to mu, and which are likely to refer to the same real-world...
The high-order co-clustering problem, i.e., the problem of simultaneously clustering several heterogeneous types of domains, is usually faced by minimizing
a linear combination of some optimization functions evaluated over pairs of correlated domains, where each weight expresses
the reliability/relevance of the associated contingency table. Clearly...
Process-oriented systems have been increasingly attracting data mining community, due to the opportunities the application of inductive process mining techniques to log data can open to both the analysis of complex processes and the design of new process models. Currently, these techniques focus on structural aspects of the process and disregard da...
Process mining techniques have recently received notable attention in the literature; for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic appro...
Process mining techniques have been receiving great atten- tion in the literature for their ability to automatically support process (re)design. The output of these techniques is a concrete workflow schema that models all the possible execution scenarios registered in the logs, and that can be profitably used to support further-coming enactments. I...
Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their structural similarity, in order to group them into clusters so that different storage, retrieval, and processing te...
Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their structural similarity, in order to group them into clusters so that different storage, retrieval, and processing te...
We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that...