Figure - available from: Journal of Intelligent Information Systems
This content is subject to copyright. Terms and conditions apply.
Source publication
The problem of classifying business log traces is addressed in the context of security risk analysis. We consider the challenging setting where the actions performed in a process instance are described in the log as executions of low-level operations (such as “Pose a query over a DB”, “Upload a file into an ftp server”), while analysts and business...
Citations
... Online deviance detection and predictive process monitoring To the best of our knowledge, the problems of estimating whether a running process instance is deviant or not (or the probability of it being deviant) has never been addressed explicitly as a predictive learning task in the literature. In fact, in online settings, the anticipated prediction (forecast) of deviant behaviours has been commonly faced via reasoning-based approaches, e.g., in the contexts of security breach detection (Fazzinga et al., 2018) and of compliance monitoring (Ly et al., 2015). ...
Detecting deviant traces in business process logs is crucial for modern organizations, given the harmful impact of deviant behaviours (e.g., attacks or faults). However, training a Deviance Prediction Model (DPM) by solely using supervised learning methods is impractical in scenarios where only few examples are labelled. To address this challenge, we propose an Active-Learning-based approach that leverages multiple DPMs and a temporal ensembling method that can train and merge them in a few training epochs. Our method needs expert supervision only for a few unlabelled traces exhibiting high prediction uncertainty. Tests on real data (of either complete or ongoing process instances) confirm the effectiveness of the proposed approach.
... Fazzinga et al. [92] proposed a method for online and offline classification of event log traces as potential security breaches. They create a security breach model, which is used later in conformance checking. ...
... They used a probabilistic approach in the created model and the following conformance checking. In their following paper [93], they proposed a classification framework that combines their previous work [92], a model-driven method with an example-driven classification. In a model-driven approach, a security breach model is created, and incoming traces are classified based on conformance checking. ...
The digitalization of our society is only possible in the presence of secure and reliable software systems governing ongoing critical processes, so-called critical information infrastructures. The understanding of mutual interdependencies of events and processes is crucial for cybersecurity and software reliability. One of the promising ways to tackle these challenges is process mining, which is a set of techniques that aims to mine essential knowledge from processes, thus providing more perspectives and temporal context to data interpretation and process understanding. However, it is unclear how process mining can help and can be practically used in the context of cybersecurity and reliability.
Therefore, in this work, we investigate the potential of process mining to aid in cybersecurity and software reliability to analyze and support research efforts in these areas. Concretely, we collect existing process mining applications, discuss current trends and promising research directions that can be used to tackle the current cybersecurity and software reliability challenges. To this end, we conduct a systematic literature review covering 35 relevant research approaches to examine how the process mining is currently used for these tasks and what are the research gaps and promising research directions in the area. This work is an extension of our previous work, which focused solely on the cybersecurity area, based on the observation of relative closeness and similar goals of those two fields, in which some approaches tend to overlap.
... The core problem of checking whether low-level traces comply to high-level behavioral models is considered in [32,90] in the challenging setting where the ground-truth mapping between the events of the traces and the activities in the model is unknown and only uncertain information is available on it-actually, the models in [32] are meant to represent known security breach patterns. ...
... The core problem of checking whether low-level traces comply to high-level behavioral models is considered in [32,90] in the challenging setting where the ground-truth mapping between the events of the traces and the activities in the model is unknown and only uncertain information is available on it-actually, the models in [32] are meant to represent known security breach patterns. ...
... Since pre-processing the traces with deterministic log abstraction methods [2,59,83] in such a setting can lead to misleading results [90] 4 , both proposals [32,90] adopt a probabilistic approach for evaluating the degree of compliance of each trace over either all of its admissible interpretations [90] or a representative subset of them, computed via Monte Carlo sampling [32]. In particular, in [90], conformance is analyzed at different levels of detail, across a hierarchy of (SESE) process fragments. ...
The ever-increasing attention of process mining (PM) research to the logs of low structured processes and of non-process-aware systems (e.g., ERP, IoT systems) poses a number of challenges. Indeed, in such cases, the risk of obtaining low-quality results is rather high, and great effort is needed to carry out a PM project, most of which is usually spent in trying different ways to select and prepare the input data for PM tasks. Two general AI-based strategies are discussed in this paper, which can improve and ease the execution of PM tasks in such settings: (a) using explicit domain knowledge and (b) exploiting auxiliary AI tasks. After introducing some specific data quality issues that complicate the application of PM techniques in the above-mentioned settings, the paper illustrates these two strategies and the results of a systematic review of relevant literature on the topic. Finally, the paper presents a taxonomical scheme of the works reviewed and discusses some major trends, open issues and opportunities in this field of research.
... With respect to our approach, this method is more focused on malware behavior analysis. An approach based on the analysis of logs is reported in [18,19]. Here authors capture possible modifications among application running obtaining a phylogeny tree. ...
Mobile phones are currently the main targets of continuous malware attacks. Usually, new malicious code is generated conveniently changing the existing one. According to this, it becomes very useful to identify new approaches for the analysis of malware phylogeny. This paper proposes a data-aware process mining approach performing a malware dynamic analysis. The process mining is performed by using a multiperspective declarative approach allowing to model a malware family as a set of constraints (within their data attributes) among the system call traces gathered from infected applications. The models are used to detect execution patterns or other relationships among families. The obtained models can be used to verify if a checked malware is a potential member of a known malware family and its difference with respect to other malware variants of the family. The approach is implemented and applied on a dataset composed of 5648 trusted and malicious applications across 39 malware families. The obtained results show great performance in malware phylogeny generation.
... Checking whether low-level traces comply to high-level behavioural models is faced in [1,18] in the challenging setting where the real mapping from the traces' events to the models' activities is uncertain. Rather than pre-processing the traces with heuristics abstraction methods [2,26], in both proposal the degree of compliance of each trace τ is evaluated probabilistically, either over the set of all possible τ 's interpretations [1] or over a Montecarlo-generated sample of the former [18], using prior event-activity mapping probabilities to discard "meaningless" interpretations. ...
... Checking whether low-level traces comply to high-level behavioural models is faced in [1,18] in the challenging setting where the real mapping from the traces' events to the models' activities is uncertain. Rather than pre-processing the traces with heuristics abstraction methods [2,26], in both proposal the degree of compliance of each trace τ is evaluated probabilistically, either over the set of all possible τ 's interpretations [1] or over a Montecarlo-generated sample of the former [18], using prior event-activity mapping probabilities to discard "meaningless" interpretations. The Montecarlo procedure of [17] can also exploit known activity-level constraints for this purpose. ...
The ever increasing attention of Process Mining (PM) research to the logs of lowly-structured processes and of non process-aware systems (e.g., ERP, IoT systems) poses several challenges stemming from the lower quality that these logs have, concerning the precision, completeness and abstraction with which they describe the activities performed. In such scenarios, most of the resources spent in a PM project (in terms of time and expertise) are usually devoted to try different ways of selecting and preparing the input data for PM tasks, in order to eventually obtain significant, interpretable and actionable results. Two general AI-based strategies are discussed here that have been partly pursued in the literature to improve the achievements of PM efforts on low-quality logs, and to limit the amount of human intervention needed: (i) using explicit domain knowledge, and (ii) exploiting auxiliary AI tasks. The also provides an overview of trends, open issues and opportunities in the field.
... The tool of Stocker and Accorsi (2014) enables the configuration of security concerns (i.e., authentication, binding of duty and separation of duties) when generating synthetic event logs. A different event log configuration according to security concerns is suggested in Fazzinga et al. (2018) who use security risk as criterion to filter related traces. To support decision making in security audits, Accorsi et al. (2013) suggest to mine the control-and the data-flow since only both perspectives make it possible to analyze security requirements. ...
Privacy regulations for data can be seen as a major driver for data sovereignty measures. A specific example for that is the case of event data that is recorded by information systems during the processing of entities in domains such as e-commerce or healthcare. Since such data, typically available in the form of event log files, contains personalized information on the specific processed entities, it can expose sensitive information that may be attributed back to individuals. In recent years, a plethora of methods have been developed to analyze event logs under the umbrella of process mining. However, the impact of privacy regulations on the technical design as well as the organizational application of process mining has been largely neglected. In this paper, we set out to develop a protection model for event data privacy, which lifts the well-established notion of differential privacy. Starting from common assumptions on the event logs used in process mining, we study potential privacy leakages and means to protect against them. We show at which stages of privacy leakages, a protection model for event logs shall be used. We instantiate the notion of differential privacy for process discovery methods, i.e., algorithms that aim at the construction of a process model from an event log. The general feasibility of our approach is demonstrated by its application to two publicly available real-life events logs.
... Existing DECLARE models are used in [17] to improve a discovered process model, encoded as a process tree, in a post-processing fashion, through three alternative methods: (i) brute-force search; (ii) a genetic programming scheme where candidate models are made evolve according to a fitness function accounting for both log conformance metrics [34] and the fraction of a-priori constraints fullfilled; and (iii) a heuristics algorithm that directly tries to correct a model based on the types of constraint it infringes. Conformance/conformance checking The core problem of checking whether lowlevel traces comply to reference behavioral models was considered in [1,19] in the challenging setting where the ground-truth mapping between the traces' events and the (higher-level) model's activities is unknown and only uncertain information is available on it -actually, the models in [19] are meant to represent known security-breach patterns. As pre-processing the traces with deterministic log abstraction methods [28,33,3] in such a setting may lead to misleading results [1], both proposal adopted a probabilistic approach for evaluating the degree of compliance of each trace over either all of its possible interpretations [1] or a representative subset of them, computed via Montecarlo sampling [19]. ...
... Existing DECLARE models are used in [17] to improve a discovered process model, encoded as a process tree, in a post-processing fashion, through three alternative methods: (i) brute-force search; (ii) a genetic programming scheme where candidate models are made evolve according to a fitness function accounting for both log conformance metrics [34] and the fraction of a-priori constraints fullfilled; and (iii) a heuristics algorithm that directly tries to correct a model based on the types of constraint it infringes. Conformance/conformance checking The core problem of checking whether lowlevel traces comply to reference behavioral models was considered in [1,19] in the challenging setting where the ground-truth mapping between the traces' events and the (higher-level) model's activities is unknown and only uncertain information is available on it -actually, the models in [19] are meant to represent known security-breach patterns. As pre-processing the traces with deterministic log abstraction methods [28,33,3] in such a setting may lead to misleading results [1], both proposal adopted a probabilistic approach for evaluating the degree of compliance of each trace over either all of its possible interpretations [1] or a representative subset of them, computed via Montecarlo sampling [19]. ...
... Conformance/conformance checking The core problem of checking whether lowlevel traces comply to reference behavioral models was considered in [1,19] in the challenging setting where the ground-truth mapping between the traces' events and the (higher-level) model's activities is unknown and only uncertain information is available on it -actually, the models in [19] are meant to represent known security-breach patterns. As pre-processing the traces with deterministic log abstraction methods [28,33,3] in such a setting may lead to misleading results [1], both proposal adopted a probabilistic approach for evaluating the degree of compliance of each trace over either all of its possible interpretations [1] or a representative subset of them, computed via Montecarlo sampling [19]. Prior event-activity mapping probabilities are used in both works to discard "unmeaningful" interpretations. ...
"Extending Process Mining techniques with additional AI capabilities to better exploit incomplete/low-level log data: solutions, open issues and perspectives"
... Therefore, we define a proper meta-model for process mining allowing to consider context information related to environment and location, which is necessary in our IoT use case. Related to event log data, a large body of research exist for security-oriented analysis [30,15,13]. For instance, the tool of Stocker and Accorsi [30] allows to configure security concerns (i.e., authentication, binding of duty and separation of duties) when generating synthetic event logs. ...
Process mining uses event data recorded by information systems
to reveal the actual execution of business processes in organizations. By doing this, event logs can expose sensitive information that may be
attributed back to individuals (e.g., reveal information on the performance of individual employees). Due to GDPR organizations are obliged to consider privacy throughout the complete development process, which also applies to the design of process mining systems. The aim of this paper is to develop a privacy-preserving system design for process mining. The user-centered view on the system design allows to track who does what, when, why, where and how with personal data. The approach is demonstrated on an IoT manufacturing use case.
... PAR systems are expert systems that run in the background and continuously monitor the execution of processes, predict their future, and, possibly, provide recommendations. A substantial body of re-search exists on evaluating risks 6 , also known as process monitoring and prediction; see, e.g., publications [4,5,8,12,14,15,17] and the survey [18]. Yet, as also indicated in [10], "existing works on interventions, i.e. mitigating [actions] are rare". ...
Process-aware Recommender systems (PAR systems) are informationsystems that aim to monitor process executions, predict their outcome, and rec-ommend effective interventions to reduce the risk of failure. This paper discussesmonitoring, predicting, and recommending using a PAR system within a finan-cial institute in the Netherlands to avoid faulty executions. While predictionswere based on the analysis of historical data, the most opportune interventionwas selected on the basis of human judgment and subjective opinions. The re-sults showed that, while the predictions of risky cases were relatively accurate,no reduction was observed in the number of faulty executions. We believe thatthis was caused by incorrect choices of interventions. While a large body of re-search exists on monitoring and predicting based on facts recorded in historicaldata, research on fact-based interventions is relatively limited. This paper reportson lessons learned from the case study in finance and proposes a new methodol-ogy to improve the performances of PAR systems. This methodology advocatesthe importance of several cycles of interactions among all actors involved so asto develop interventions that incorporate their feedback and are based on insightsfrom factual, historical data.
... Personal & social context activity relationship [37] ability entity property [3,37] Task context history [5,14,38,39,45] goal causality [41,42,43,46,47,48,49] [14,37,38,43,42,40] location [49,40] In fact, many context properties have not been covered yet when mapping events to activities, which can be concluded from the comparison to the analysis on context-awareness in process mining. Particularly, the context properties "activity", "ability" and "entity property", "equipment", and "location" are not sufficiently covered by the literature we identified within our review. ...
... Personal & social context activity relationship [37] ability entity property [3,37] Task context history [5,14,38,39,45] goal causality [41,42,43,46,47,48,49] [14,37,38,43,42,40] location [49,40] In fact, many context properties have not been covered yet when mapping events to activities, which can be concluded from the comparison to the analysis on context-awareness in process mining. Particularly, the context properties "activity", "ability" and "entity property", "equipment", and "location" are not sufficiently covered by the literature we identified within our review. ...
Event log files are used as input to any process mining algorithm. A main assumption of process mining is that each event has been assigned to a distinct process activity already. However, such mapping of events to activities is a considerable challenge. The current status-quo is that approaches indicate only likelihoods of mappings, since there is often more than one possible solution. To increase the quality of event to activity mappings this paper derives a contextualization for event-activity mappings and argues for a stronger consideration of contextual factors. Based on a literature review, the paper provides a framework for classifying context factors for event-activity mappings. We aim to apply this framework to improve the accuracy of event-activity mappings and, thereby, process mining results in scenarios with low-level events.