Content uploaded by Alberto Sillitti
Author content
All content in this area was uploaded by Alberto Sillitti on Aug 31, 2016
Content may be subject to copyright.
Chapter 11
Knowledge Extraction from Events Flows
Alireza Rezaei Mahdiraji, Bruno Rossi, Alberto Sillitti, and Giancarlo Succi
Abstract. In this chapter, we propose an analysis of the approaches and methods
available for the automated extraction of knowledge from event flows. We specifi-
cally focus on the reconstruction of processes from automatically generated events
logs. In this context, we consider that knowledge can be directly gathered by means
of the reconstruction of business process models. In the ArtDECO project, we frame
such approaches inside delta analysis, that is the detection of differences of the exe-
cuted processes from the planned models. To this end, we provide an overview of the
different techniques available for process reconstruction, and propose an approach
for the detection of deviations. To show its effectiveness, we instantiate the usage to
theArtDECOcasestudy.
11.1 Introduction
Event logs are typically available inside an organisation, and they encode relevant
information about the execution of high level business processes. Even if such infor-
mation is usually available, very often knowledge about processes can be difficult
to reconstruct or processes can deviate from the planned behaviour. Thus, retrieving
knowledge from such event flows is part of the so-called process mining approach,
that is the reconstruction of business processes from event log traces [30].
The reasons for using process mining are usually multiple and multi-faceted.
Two are the most relevant. 1) the information can be used to see whether the ac-
tions inside an organisation are aligned with the designed business processes (so-
called delta analysis). 2) process mining can be used to derive existing patterns in
users’ activities that can then be used for process improvement. In both cases, the
Alireza Rezaei Mahdiraji ·Bruno Rossi ·Alberto Sillitti ·Giancarlo Succi
Center for Applied Software Engineering (CASE) - Free University of Bozen-Bolzano,
Piazza Domenicani 3, 39100 Bolzano, Italy
e-mail: {alireza.rezaei,bruno.rossi,alberto.sillitti,gsucci}@unibz.it
G. Anastasi et al. (Eds.): Networked Enterprises, LNCS 7200, pp. 221–236, 2012.
©Springer-Verlag Berlin Heidelberg 2012
222 A. Rezaei Mahdiraji et al.
knowledge that can be gathered allows a more efficient usage of resources inside an
organisation [28].
In this chapter, we provide an overviewof several techniques and approaches that
can be used for process mining from event flows. We analyse such approaches, and
we describe how such approaches have been used in the context of the ArtDECO
project.
11.2 Knowledge Flow Extraction in the ArtDECO Project
In the context of the ArtDECO project, we have interconnected networked enter-
prises that do not only use their own internal business processes but need to orches-
trate higher level processes involving several companies. Three layers have been
identified: the business process,theapplication,andthelogical layer.Thebusiness
process level deals with the interrelations among enterprises at the level of business
process models. The application level is the implementation of the business pro-
cesses, running either intra- or inter-enterprises. The logical layer is an abstraction
over the physical resources of networked enterprises. The contextualisation to the
GialloRosso winery case study is discussed in Chapter 2, Figure 1.1.
The vertical view shows all the different abstract processes, while the horizontal
line evidences the stakeholders interested in the different information flows. Enter-
prises are typically integrated horizontally, as such we need to consider the integra-
tion of the different business processes. As the case study has been instantiated to the
wine domain, we have typically three different stakeholders to consider: the winery,
the delivery company and the retailer. Each actor has its own business process that
is integrated in the overall view of the general interconnected processes.
Extracting knowledge from low level event flows, poses thus more issues that
normal business process reconstruction as we have very fine-grained information as
the source of the whole reconstruction process [18, 6].
In this context, a possible approach for process improvementis the following:
1. Collection of data with client-side plug-ins. This is a step that can be performed
by using systems for automated data collection enabling the organisation to
harvest information during the execution of the business processes. Such infor-
mation can then be used to reconstruct knowledge from the low-level events
generated;
2. Collection of data from sensor networks. Another source of data comes from
sensor networks. They provide low level data that can be used to evaluate in-
consistencies between the actual situation and the expected one. Such low-level
events can be used for subsequent phases of the process;
3. Integration of the data sources into a DBMS. All low-level events need to be
integrated in a common centralised repository or DBMS. Such point of integra-
tion can be queried for the reconstruction of the process and the evaluation of
triggers when the expected behaviour is not conformant to the planned business
processes;
11 Knowledge Extraction from Events Flows 223
4. Reconstruction of the business processes. To reconstruct the business processes
in an organisation, we need to extract knowledge from low-level events data.
To this end, we need some algorithms that - given a set of events - are able of
reconstructing the ongoing process;
5. Delta analysis of the deviations from expected behaviour. Once the business
processes have been reconstructed, the difference with the planned models can
be detected. If necessary, triggers are set-up to notify stakeholders about unex-
pected behaviours in the context of the planned high-level processes;
Such approach can be supported by a tool as PROM [27] that allows the collection
of low-level events and the subsequent analysis of the processes [3, 4, 9].
In this chapter, we focus on techniques for the extraction of knowledge from
event flows and process reconstruction (Step 4).
11.3 Overview of Process Mining Techniques
A business process is a collection of related activities that produces a specific service
or product for a customer. The success of an organisation is directly related to the
quality and efficiency of its business processes. Designing these processes is a time
consuming and error prone task, because the knowledge about a process is usually
distributed between among employees and managers that execute them. Moreover,
such knowledge is not only distributed in a single organisation but, frequently, it is
distributed across several organisations (cross-organisational processes) that belong
to the same supply-chain. For these reasons, business processes experts that are in
charge of the formal definition of processes face an hard task. Additionally, the
designed model needs to adapt to changes that the market imposes to organisation
to be competitive.
To spend less time and effort and obtain models based on what really happens
in an organisation, we can adopt a bottom-up approach, i.e., extract the structure of
processes from recorded event-logs [26, 2, 5]. Usually, information systems in an
organisation have logging mechanisms to record most of the events. These logs con-
tain information about the actual execution of the processes such as the list of exe-
cuted tasks, their order and process instances. The method of extracting the structure
of process (a.k.a. process model) from the event-logs is known as process mining
or process discovery or workflow mining [28]. The extracted model can be used to
analyse and improve current business, e.g., it can be used to detect the deviations
from normal process executions [19].
Each event-log consists of several traces. Each trace corresponds to the execution
of an instance process, also known as a case. Each case is obtained by executing
activities (tasks) in a specific order. A process model is designed to handle similar
cases, it specifies tasks to be executed and their temporal order of execution.
224 A. Rezaei Mahdiraji et al.
11.3.1 A General Process Mining Algorithm
Figure 11.1 shows the steps of a typical process mining algorithm. The first step
deals with reading the content of the log. Most of the algorithms assume that the
log contains at least the cases identifiers, tasks identifiers, and execution orders of
the tasks for each case. This is the minimal information that is needed for process
mining. However, in reality, logs contain further information that can be used to
extract more dependency information, e.g. if a log contains information about start
and completion time for each task (non-atomic tasks), parallelism detection can be
done by examining just one trace, otherwise at least two traces are needed [35].
The second step deals with the extraction of the dependency relations (also
known as follows relations) among tasks. These relations can be inferred using tem-
poral relationships between tasks in event-log. Task B is dependent on task A iffin
every trace of the log, task B follows task A. This definition for real-world logs is
unrealistic because logs always contain noise. Noisy data in the logs are either be-
cause of a problem in the logging mechanism or exceptional behaviours. A process
mining algorithm has to extract the most common behaviour even in presence of
noisy data that makes the task of the mining algorithm more difficult and can result
in overfitting (i.e., the generated model is too specific and allows only the exact be-
haviours seen in the log). To cut-offthe noise, we need a way (e.g., a threshold) to
discard less frequent information and reconstruct a sound business process. Hence,
we need to consider the frequency of dependency relation. We can modify the def-
inition of dependency relation as follows: the task B is dependent on task A iffin
most traces task B follows task A.
The third step deals with the induction of the structure of the process model . The
problem is to find a process model that satisfies three conditions: (i) generates all
traces in the log (in case of noise-free logs), (ii) only covers few traces that are not
in the log (extra-behaviur), and (iii) has the minimal number of nodes (i.e., steps
of the process). For simple processes, it is easy to discover a model that recreates
Fig. 11.1 Steps in a General Process Mining Algorithm
11 Knowledge Extraction from Events Flows 225
the log, but for larger processes it is an hard task, e.g., the log does not contains
all combinations of selection and parallel routings or some paths may have low
probability and remain undetected or log may contain too noisy data.
The forth step deals with the identification of routing paths among the nodes.
The resulting models may include of four basic routing constructs as follows: (i)
Sequential: the execution of one task is followed by another task, (ii) Parallel: tasks
A and B are in parallel if they can be executed in any order or at the same time, (iii)
Conditional (Selection or choice ): between tasks A and B, either task A or task B is
executed, and (iv) Iteration (Loop):when a task or a set of tasksis executed multiple
times.
The induction of conditions phase corresponds to induction of conditions for
non-deterministic transitions (i.e., selection) based on process attributes. A process
attribute is a specific piece of information used as a control variable for the routing
of a case, e.g., the attributes of the documents which are passed between actors.
Approaches such as decision rule mining from machine learning can be used to
induce these conditions on a set of attributes [16].
The last step converts the result model (e.g., a Petri net) to a graphical represen-
tation that is easy to understand for the users.
11.3.2 Challenges of Process Mining Algorithms
In [28], several of the most challengingproblems of the process mining research has
been introduced. Some of those challenges are now partially solved, such as mining
hidden tasks, mining duplicate tasks, mining problematic constructs (e.g., non-free-
choice net) and using time information, but some of them still need more research.
The current most important challenges of process mining are:
•Mining different perspectives. Most of the research in process mining is de-
voted to mining control flow (How?), i.e., ordering of tasks. There are few works
that aim at mining perspective such as organisational perspectives (Who?), e.g.,
mining social networks of organisational relationships [31].
•Handling noise. The noise in real world event logs is unavoidable and each
process mining system should deal with noisy data. As mentioned before, noisy
logs cause overfitting. The main idea is that the algorithms can be able to de-
tect exceptions from real behaviours. Usually, exceptions are infrequent ob-
servations in logs, e.g., infrequent casual relations. Most algorithms provide a
threshold to distinguish between noise and correct behaviours. Thereare several
heuristics in literature to handle noise.
•Underfitting. It means that the model overgeneralises behaviours seen in the
logs and that is because of unsupervised setting, i.e., lack of negative traces in
the logs [7]. The logs only contain successful process instances, so to generate
negative instance process we need to generate them from the current log, e.g.,
first we find outliers in the current log and label them as negative traces and
then apply a binary classification algorithm.
226 A. Rezaei Mahdiraji et al.
•Conformance Testing. Comparing a prior model of the process with the ex-
tracted model from process mining is known as conformance testing or delta
analysis. The comparing procedure aims at finding the differences and com-
monalities between the two models [24].
•Dependency between Cases. Most of current approaches assume that there are
no dependencies among cases, i.e. the routing of one case does not depend on
the routing of other cases or in another words, the events of different cases are
independent. Real world data may violate this assumption. For example, there
may be a competition among different cases for some resources. These so-called
history-dependent behaviours are stronger predictor for the process model [12].
•Non-Unique Model. It means that different non-equivalentprocess models can
be mined from one log. Some of them are too specific and some others are
too general, but all of them generate the log. By finding a balance between
specificity and generality, we can find an appropriate process model.
•Low-Level Event-Logs. Most of current techniques on process mining are only
applicable for process-awareinformation systems’ logs, i.e., logs that are in task
level abstraction. This is not the case in many real-life logs. They usually lack
the concept of task and instead they contain many low-level events. Although
groups of low-level events together represents tasks, it is not easy to infer those
groups. The informationsystems’ logs must be first ported to task level and then
a process mining technique can extract the process model [33, 13, 15].
•Un-Structured Event-Logs. In real life systems, even if there is a prior pro-
cess model, it is not enforced and there are lots of behaviours in the logs that
are instances of deviation from the model. This flexibility results in unstructured
process models, i.e., models with lots of nodes and relations. Most of techniques
in literature generate unstructured models in such environments. The resulting
models are not incorrect and usually capture all deviations in the log. Two ap-
proaches in literature dealing with this issue use clustering and fuzzy techniques
[33, 14].
11.3.3 Review of the Approaches
There are several approaches for mining process models from log data [28]. Based
on the strategy that they use to search for appropriate process model, we can divide
them as either local or global approaches. Local strategies rely only on local infor-
mation to build the model step by step, while global strategies search in the space of
potential models to find the model. The local strategies usually have problems with
noise and discovering more complex constructs. Most of the current approaches of
process mining are in the first category and only few of them are in the global cate-
gory. Figure 11.1 depicts the general steps of local algorithms. In the following, we
concisely introduce some of these approaches.
In [16, 17], three algorithms are developed, namely, Merge Sequential, Split Se-
quential, and Split Parallel. These algorithms are the best known for dealing with
11 Knowledge Extraction from Events Flows 227
duplicate tasks. The first and second algorithms are suitable for sequential process
models and the third one for parallel processes. They extract process models as a
Hidden Markov Model (HMM). HMM is basically a Finite State Machine (FSM),
but each transition has a probability associated and each state has a finite set of
output symbols. The Merge Sequential and Split Sequential algorithms use the gen-
eralization and specialization approaches, respectively. The Merge Sequential is a
bottom-up algorithm, i.e., it starts with the most specific process model with one
separate path for each trace in the log, then it iteratively generalizes the model by
merging states that having same output symbols, till the satisfaction of a termination
criterion. To reduce the size of the initial model, prefix HMM can be used, i.e., all
states that have a common prefix are mappedto a single state. The problem with spe-
cialization operator is the number of merging operations, i.e., even with few states
with the same symbols, the number of merging operations is usually very large.
The Split Sequential algorithm aims to overcome the complexity of Merge Se-
quential. It starts with the most general process model (i.e., without duplicate tasks
and able to generate all the behaviours in the log) and then iteratively splits states
with more than one incoming transition into two states. The Split Parallel algorithm
is an extension of the Split Sequential for concurrent processes. The reason for this
extension is that unlike sequential processes, when the split of a node in concur-
rent processes changes the dependency graph, in this case it may have global side
effects. So, instead of applying the split operator on the model, the split operations
are done at level of the process instance. It is also top-down and works as follows:
suppose that activity A is split. The split operator is able to make distinction among
certain occurrences of A, e.g., A1 and A2. Then, it induces a general model based
on current instances which contains two nodes A1 and A2 instead of only A. After
termination of specialization, re-labelled nodes are changed to their original labels.
This approach also uses a decision rule induction algorithm to induce conditions for
non-deterministic transitions.
In [25], a new approach based on block-oriented representation to extract min-
imal complete models has been introduced. In block-oriented representation, each
process model consists of a set of nested building blocks. Each building block con-
sists of one or more activities or building blocks that are connected by operators.
There are four main operators, namely, sequence, parallel, alternative, and loop.
Operators define the control-flow of the model.
The works in [28, 29, 30] are some of the more extensive works in process min-
ing. Authors started by developing an alpha algorithm and over time they extended
it with many modifications to tackle different challenges. Authors proved that alpha
algorithm is suitable for a specific class of models. In the first version of their alpha
algorithm, they assumed that logs are noise-free and complete. The alpha algorithm
is unable to find short loops and implicit places, and works based on binary rela-
tions in the logs. There are four relations: follows, causal, parallel, and unrelated.
Two tasks A and B have a follows relation if they appear next to each other in the
log. This relation is the basic relation from which the other relations are extracted.
Two tasks A and B have a causal relation if A follows B, but B does not follow A.
If B also follows A, then the tasks have a parallel relation. When A and B are not
228 A. Rezaei Mahdiraji et al.
involved in a follows relation, they are said to be unrelated. All the dependency re-
lations are inferred based on local information in the log. Additionally, because the
algorithm works based on sets, it cannot mine models with duplicate tasks. The al-
pha algorithm works only based on follows relation without considering frequency,
therefore it cannot handle noise. In [34], the alpha algorithm was enhanced with
several heuristics to consider frequencies to handle noise. The main idea behind the
heuristics is as follows: the more often task A follows task B and the less often B
follows A, it is more likely that A is a cause for B. Because the algorithm mainly
works based on binary relations, the non-free-choice constructs cannot be captured.
In [21], they first extended the three relational metrics in [34] and added two new
metrics that are more suitable in distinction between choice and parallel situations.
Then, based on a training set that contains information about five metrics and ac-
tual relations between pairs of activities (causal, choice, and parallel), they applied
a classification rule learner to induce rules that distinguish among different relations
based on the values of five relational metrics.
Clustering-Based Techniques. Most of the approaches in process mining only
produce one process model. This single model is supposed to represent every sin-
gle detail in the log. So, the result model is intricate and hard to understand. Many
real-world logs contain such unstructured models, usually because they allow very
flexible execution of the process model. Flexible environments generate heteroge-
neous logs, i.e., they contain informationabout very different cases. Another reason
for unstructured logs is that some logs record information about different processes
that belong to the same domain. In both cases, only one process model is unable to
describe this kind of log and the result model is very specific (when an exact process
model is the goal) or over-general. The basic idea of using clustering techniques in
the process mining domain is to divide original logs into several smaller logs where
each one of them contains only homogeneous cases. Then, for each partition, a sep-
arate process model is extracted.
In the process mining literature, there are several studies using clustering in dif-
ferent ways in process mining context. For example in [1], each trace is represented
with a vector of features extracted from different traces, i.e., control-flow, organisa-
tion, data, etc. In [10, 11, 23], a hierarchy of process model is generated, i.e., each
log partition will be furthered partitioned if it is not expressive enough.
These are are all local algorithms, but there are three main global approaches in
literature. Namely, genetic process mining, approach based on first order logic, and
fuzzy process mining.
Genetic Process Mining. Since genetic algorithms are global search algorithms,
genetic process mining is also a global algorithm that can handle noise and can
capture problematic structure such as non-free-choice constructs [22, 32]. To apply
genetic algorithms to process mining, a new representation for process models was
proposed in [22, 32], known as casual matrix. Each individual or potential process
model is represented by a causal matrix. Casual matrix contains information about
casual relations between the activities and input/output of each activity. Causal ma-
trix can be mapped to Petri Nets [22, 32], where they used three genetic operators,
11 Knowledge Extraction from Events Flows 229
namely, elitism, crossover, and mutation. Elitism operator selects a percentage of
best process models for the next generation. Crossover recombines causality rela-
tionships in the current population and mutation operator inserts new casual rela-
tionships, and adds/removes activities from input or output of each activity. One of
the main drawback of genetic process mining is the computational complexity of
the approach.
First-Order-Logic Approach. Another approach based on global search uses a
first-order-logic learning approach to process mining [20]. In contrast to other ap-
proaches seen so far, this approach generates declarative process models instead of
imperative (procedural) models. Imperative approaches define exact execution order
for a set of tasks and declarative approaches only focus on what should be done. In
[20], the process mining problem is defined as a binary classification problem, i.e.,
the log contains positive and negative traces.
Advantages of using first-order learning are as follows: (i) it can discover struc-
tural patterns, i.e., search for patterns of relations between rows in the event log, (ii)
by using declarative representation, it generates more flexible process models, and
(iii) it can use prior knowledge in learning.
Fuzzy Process Mining. The third global strategy is a fuzzy approach. The idea of
fuzzy process mining is to generate different views of the same process model based
on configurable parameters. Based on what is interesting, configuration parameters
are used to keep only those parts of the process that are relevant and remove oth-
ers. To achieve this objective, it considers both global and local information from
different aspects of the process such as control-flow and organisational structure to
produce a process model. This is why the fuzzy approach is also known as multi-
perspective approach [14].
11.4 Application to the GialloRosso Case Study
In this section, we start by defining the process of delta analysis, then we delve
into an application to the GialloRosso case study. Knowledge extraction in the
ArtDECO Project has been contextualised to the analysis of deviations from
Fig. 11.2 Phases of the lifecycle of Business Processes for delta analysis
230 A. Rezaei Mahdiraji et al.
expected business models. In the specific, we used a local algorithm based on
happens-before relations, and supported by a rule-based system for tasks recon-
struction from event logs. Figure 11.2 shows different phases of a business process,
from modelling, instantiation, to execution. We start generally with a business pro-
cess model that represents all the steps as they have been conceived by a business
process analyst. Such model is then instantiated and executed: at the same time,
several executions of the same process model can be running. Each process execu-
tion needs to be monitored to derive indications from the real running processes.
Then, this information can be used to reconstruct the real model. Afterwards, delta
analysis will be used to compare the planned and the actual models to derive the
deviations. For delta analysis, the monitoring phase is critical for the derivation of
low-level events that can be analysed for knowledge extraction. Once the actual ex-
ecution model has been reconstructed by means of algorithms for process mining,
delta analysis is used to derive the deviations from the execution traces.
In our case, the process of delta analysis is done by means of the following steps:
(a) determination of the happened-before relations among tasks from the original
planned model. For each task, if task A precedes B, then A ->B;
(b) process monitoringfor the collection of events for different instances of process
execution;
(c) tasks reconstruction from events by means of a rule-based system, as in Figure
11.3;
(d) determination of the violation of happened-before relations from execution
traces, when an event in task B is detected while no event from task A hap-
pening before. Such violations are annotated;
In particular, to reconstruct the process phases from low-level events, events need
to be mapped to higher-level constructs (Figure 11.3). For this, we use a rule-based
system that maps all the events according to domain information to the higher levels,
Fig. 11.3 Phases of the lifecycle of Business Processes for delta analysis
11 Knowledge Extraction from Events Flows 231
so that it is possible to associate each event to a phase of a process. A rule is defined
by means of a source application that generated the event, the text of the event, and
a discriminator for a particular process instance (e.g. an item the instance of the
process is related to). Each rule specifically maps to a task in the original process
model. The actor/agent that generated the event is already part of the event metadata.
After rules are applied, events that do not comply to a rule are discarded and not
considered for delta analysis.
If we consider the case study of the winery (the GialloRosso case, see Chapter
2), we can instantiate the approach to part of the business processes defined to show
how the approach has been applied.
The planned behaviour in the case study foresees that when the handling of the
wine starts, a message must be sent from the distributor of the winery to the car-
rier, responsible for the transportation (Figure 11.4). In parallel, the distributor starts
the quality monitoring process for the delivery so that to gather objective data about
the performance of the delivery process. Such information is then used to alert the
distributors and to update the final wine quality. At this point, the whole process ends.
Figure 11.4 already contains information about different execution traces and
their deviation from the planned behaviour. The detail is shown with the process
number (e.g. P1 or P2) and an indication of a (d)eviation or an (u)nauthorized action
during the step. We will explain in the following how these activities are detected
and the software implementation that has been used for the analysis.
With the scope of explainingthe approach undertaken, we just focus on the initial
data exchange among the distributor and the carrier (top left part of Figure 11.4).
Actions can deviate from the original plans under some circumstances, as the pro-
cess can start without being triggered by a message in the information system. For
example, the process can be started due to personal communication among the two
actors of the process. Even in this trivial case, this activity can be detrimental for
Fig. 11.4 The ’discover carrier’ subprocess results from delta analysis: the original workflow
is tagged with deviations of two process instances P1 and P2
232 A. Rezaei Mahdiraji et al.
process improvement analyses, as there is no information about how much time
was required to pass from the decision taken at the management level to the control
center level. Also, there is no tracking of the original communication, and evalua-
tion of errors in the definition and/or execution of the directives that were assigned.
Furthermore, the execution violates the original planned sequence of actions.
This can be enforced by the low-level data process mining. The following pre-
conditions and post-conditions (happened-before relations), can be inferred from the
original model. They can be derived from the execution of several execution traces
or derived from the original process. If we consider Handling Start (H S),MailSend
(MS),andHandling Execution (H E) actions:
H_S = pre (MS(Distributor, Carrier))
H_S = post (MS(Carrier, Distributor))
H_S = post (H_E)
We derive that the Handling start phase has a precondition that a message must
be exchanged between two actors of the business process. Post conditions are an-
other exchange of messages among the actors, and the execution of the successive
phase in process. In other terms, these are the conditions for the part of the process
that we consider, if we use happened-before notation:
MS(Distributor, Carrier) -> H_S;
H_S -> MS(Carrier, Distributor);
H_S -> H_E;
Once we have this information for the planned model, we need to focus on the
actual instances of the process. If we run several instances of the process monitoring
the actors, we collect execution traces that need to be mapped to higher constructs.
This is done by a domain-dependent rule-based system that maps events to tasks and
process phases. An alternative is to use machine learning approaches that need to be
trained by several executiontraces. The rule-based system needs also to discriminate
the specific instance of the process, so we need to have a way to divide flows of
events across different processes. The following is an example of a rule:
APP{"any"} AND EVENT{"*warehouse*"} AND DISC{"item *"} -> Task A
In this case, we are defining a rule to process events generated by any application
that are related to documents that have warehouse in the title and should be divided
according to a tagged item. Therefore, events with different items will be mapped
to different process flows. All the events that comply to this rule will be mapped to
task A, by keeping the timestamp information.
Once events have been associated to the planned tasks, the detection of violations
of the original model is done by means of the evaluation of temporal execution of
the events associated to a task: if a task is executed before the actual execution of
another temporal-related task, such violation is annotated.
In Figure 11.4 there is an example of the original process annotated with viola-
tions from two running instances P1 and P2. Each violation means the possibility
that the process has been executed without following the original temporal relations
among phases. In the case study, this can mean that the communication flows among
11 Knowledge Extraction from Events Flows 233
stakeholders followed differentpaths rather than those planned. We can see the num-
ber of violations per task and we can further focuson each task to inspect the causes
of such deviations by looking at the single low level events. We can see two dif-
ferent types of information. The (d)eviations, actions undertaken without respecting
temporal relations, for example sending the delivery without prior communication
by the distributor, or (u)nauthorized actions, that is actions that were not allowed
at a specified step, like recording information in a warehouse’s registry. The latter
kind of actions can be specified by the user to be evaluated at runtime against the
real execution traces.
Fig. 11.5 PROM plug-in for the execution of delta analysis
For the execution of process mining and delta analysis, we implemented a plu-
gin for the PROM software [27]. In particular, we took the opportunity to use the
low-level events collected by PROM to perform the analysis. The plug-in has been
integrated into the Eclipse1application with the support of the Graphical Modeling
Framework (GMF).
Figure 11.5 gives an overview of the software prototype used for the analysis.
We see the same business case that has been followed by explaining the approach,
loaded into the application’s workspace. The current prototype takes three different
types of inputs: a) the definition of a workflow, b) the definition of the rules for each
node in the workflow, c) a set of events generated from several execution traces.
Given the inputs, the application parses all the events and annotates the business
process with relevant information. The relevant quadrants of the application in Fig-
ure 11.5 are the top and bottom ones. In the top one, the loaded processis visualized.
In the bottom one there is the output of all the deviations detected, and the user can
select the different view to activate in the workspace.
1www.eclipse.org
234 A. Rezaei Mahdiraji et al.
In the case used to exemplify the approach, two different execution traces have
been instantiated by generating events and set as the input for the application. After
the analysis has been performed, each node of the loaded process is marked with
deviation information derived from delta analysis. This is a particolar scenario, in
which the data is analysed ex-post. As noted in section 11.2, the usefulness of the
proposed approach is to analyse data in real-time, with execution traces collected
and analysed as actions are performed by the actors of the process.
11.5 Conclusions
In this chapter, we proposed an analysis of methods for an automated extraction
of knowledge from event flows. We focused specifically on process mining, that is
reconstructing business processes from event log traces.
Process mining can be important for organisations that want to reconstruct knowl-
edge hidden in the event logs. Typically any organisation has the opportunity to col-
lect this kind of information. The advantages are multi-faceted, mostly we referred
to two specific areas.
On one side, such knowledge can be used to evaluate whether the high level busi-
ness processes are aligned with the business plan models. As such, process mining
can be used to see whether the actual behaviour is deviating from the expected be-
haviour. On the other side, the knowledge can be used to detect hidden behaviours
- i.e. not encoded in high level business processes - inside the organisation. Such
behaviours can then be the focus of further analyses to see whether they are really
required, resources are wasted, or even process improvement/restructuring opportu-
nities can derive from them.
We proposed an approach based on delta analysis to derive information from low
level event flows and reconstruct the original processes. We showed in the context
of the case study of the ArtDECO project, the GialloRosso winery, how event flows
are used to reconstruct the original processes and detect deviations from the planned
model.
References
1. Aires da Silva, G., Ferreira, D.R.: Applying Hidden Markov Models to Process Mining.
In: Rocha, A., Restivo, F., Reis, L.P., Torrao, S. (eds.) Sistemas e Tecnologias de Infor-
macao: Actas da 4a Conferencia Iberica de Sistemas e Tecnologias de Informacao, pp.
207–210. AISTI/FEUP/UPF (2009)
2. Coman, I., Sillitti, A.: An Empirical Exploratory Study on Inferring Developers’ Activ-
ities from Low-Level Data. In: 19th International Conference on Software Engineering
and Knowledge Engineering (SEKE 2007), Boston, MA, USA, July 9-11 (2007)
3. Coman, I., Sillitti, A.: Automated Identification of Tasks in Development Sessions. In:
16th IEEE International Conference on Program Comprehension (ICPC 2008), Amster-
dam, The Netherlands, June 10-13 (2008)
11 Knowledge Extraction from Events Flows 235
4. Coman, I., Sillitti, A., Succi, G.: Investigating the Usefulness of Pair-Programming in a
Mature Agile Team. In: 9th International Conference on eXtreme Programming and Ag-
ile Processes in Software Engineering (XP 2008), Limerick, Ireland, June 10-14 (2008)
5. Coman, I., Sillitti, A.: Automated Segmentation of Development Sessions into Task-
related Subsections. International Journal of Computers and Applications 31(3) (2009)
6. Coman, I., Sillitti, A., Succi, G.: A Case-study on Using an Automated In-process Soft-
ware Engineering Measurement and Analysis System in an Industrial Environment. In:
31st International Conference on Software Engineering (ICSE 2009), Vancouver, BC,
Canada, May 16-24 (2009)
7. Cook, J.E., Wolf, A.L.: Discovering Models of Software Processes from Event-Based
Data. ACM Transactions on Software Engineering and Methodology 7(3), 215–249
(1998)
8. Ferreira, D., Zacarias, M., Malheiros, M., Ferreira, P.: Approaching Process Mining with
Sequence Clustering: Experiments and Findings. In: Alonso, G., Dadam, P., Rosemann,
M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 360–374. Springer, Heidelberg (2007)
9. Fronza, I., Sillitti, A., Succi, G.: Modeling Spontaneous Pair Programming when New
Developers Join a Team. In: 3rd International Symposium on Empirical Software Engi-
neering and Measurement (ESEM 2009), Lake Buena Vista, FL, USA, October 15-16
(2009)
10. Greco, G., Guzzo, A., Pontieri, L., Sacca’, D.: Discovering expressive process models
by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)
11. Greco, G., Guzzo, A., Pontieri, L.: Mining taxonomies of process models. Data &
Knowledge Engineering 67(1), 74–102 (2008)
12. Goedertier, S., Martens, D., Baesens, B., Haesen, R., Vanthienen, J.: Process Mining
as First-Order Classification Learning on Logs with Negative Events. In: ter Hofstede,
A.H.M., Benatallah, B., Paik, H.-Y. (eds.) BPM Workshops 2007. LNCS, vol. 4928, pp.
42–53. Springer, Heidelberg (2008)
13. Guenther, C.W., Van der Aalst, W.M.P.: Mining Activity Clusters from Low-Level Event
Logs. BETAWorking Paper Series, WP 165. Eindhoven University of Technology, Eind-
hoven (2006)
14. G¨unther, C.W., van der Aalst, W.M.P.: Fuzzy Mining – Adaptive Process Simplification
Based on Multi-perspective Metrics. In: Alonso, G., Dadam, P., Rosemann, M. (eds.)
BPM 2007. LNCS, vol. 4714, pp. 328–343. Springer, Heidelberg (2007)
15. G¨unther, C.W., Rozinat, A., van der Aalst, W.M.P.: Activity Mining by Global Trace
Segmentation. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP,
vol. 43, pp. 128–139. Springer, Heidelberg (2010)
16. Herbst, J.: Dealing with concurrency in workflow induction. In: Proceedings of the 7th
European Concurrent Engineering Conference, Society for Computer Simulation (SCS),
pp. 169-174 (2000)
17. Herbst, J., Karagiannis, D.: Integrating Machine Learning and Workflow Management
to Support Acquisition and Adaptation of Workflow Models. International Journal of
Intelligent Systems in Accounting, Finance and Management 9, 67–92 (2000)
18. Janes, A., Scotto, M., Sillitti, A., Succi, G.: A perspective on non-invasive software
management. In: 2006 IEEE Instrumentation and Measurement Technology Conference
(IMTC 2006), Sorrento, Italy, April 24-27 (2006)
19. Janes, A., Sillitti, A., Succi, G.: Non-invasive software process data collection for expert
identification. In: 20th International Conference on Software Engineering and Knowl-
edge Engineering (SEKE 2008), San Francisco, CA, USA, July 1-3 (2008)
236 A. Rezaei Mahdiraji et al.
20. Lamma, E., Mello, P., Riguzzi, F., Storari, S.: Applying Inductive Logic Programming
to Process Mining. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007.
LNCS (LNAI), vol. 4894, pp. 132–146. Springer, Heidelberg (2008)
21. Maruster, L., Weijters, A.J.M.M., Van der Aalst, W.M.P., Van den Bosch, A.: A rule-
based approach for process discovery: Dealing with noise and imbalance in process logs.
Data Mining and Knowledge Discovery 13(1), 67–87 (2006)
22. de Medeiros, A.K.A., Weijters, A.J.M.M., van der Aalst, W.M.P.: Genetic Process Min-
ing: A Basic Approach and Its Challenges. In: Bussler, C.J., Haller, A. (eds.) BPM 2005.
LNCS, vol. 3812, pp. 203–215. Springer, Heidelberg (2006)
23. de Medeiros, A.K.A., Guzzo, A., Greco, G., van der Aalst, W.M.P., Weijters, A.J.M.M.,
van Dongen, B.F., Sacc`a, D.: Process Mining Based on Clustering: A Quest for Preci-
sion. In: ter Hofstede, A.H.M., Benatallah, B., Paik, H.-Y. (eds.) BPM Workshops 2007.
LNCS, vol. 4928, pp. 17–29. Springer, Heidelberg (2008)
24. Rozinat, A., van der Aalst, W.M.P.: Conformance Testing: Measuring the Fit and Appro-
priateness of Event Logs and Process Models. In: Bussler, C.J., Haller, A. (eds.) BPM
2005. LNCS, vol. 3812, pp. 163–176. Springer, Heidelberg (2006)
25. Schimm, G.: Mining exact models of concurrent workflows. Comput. Ind. 53, 265–281
(2004)
26. Scotto, M., Sillitti, A., Succi, G., Vernazza, T.: Dealing with Software Metrics Collection
and Analysis: a Relational Approach. Studia Informatica Universalis, Suger 3(3), 343–
366 (2004)
27. Sillitti, A., Janes, A., Succi, G., Vernazza, T.: Collecting, Integrating and Analyzing Soft-
ware Metrics and Personal Software Process Data. In: Proceedings of the 29th EUROMI-
CRO Conference (2003)
28. Van der Aalst, W.M.P., Weijters,A.: Process mining: a research agenda. Comput. Ind. 53,
231–244 (2002)
29. Van der Aalst, W.M.P., Van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G., Weijters,
A.J.M.M.: Workflow mining: A survey of issues and approaches. Data & Knowledge
Engineering 47(2), 237–267 (2003)
30. Van der Aalst, W.M.P., Weijters, A.J.M.M., Maruster, L.: Workflow Mining: Discovering
Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineer-
ing 16(9), 1128–1142 (2004)
31. Van der Aalst, W.M.P., Reijers, H., Song, M.: Discovering Social Networks from Event
Logs. Computer Supported Cooperative work 14(6), 549–593 (2005)
32. van der Aalst, W.M.P., de Medeiros, A.K.A., Weijters, A.J.M.M.: Genetic Process Min-
ing. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 48–69.
Springer, Heidelberg (2005)
33. Song, M., G¨unther, C.W., van der Aalst, W.M.P.: Trace Clustering in Process Mining.
In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008 Workshops. LNBIP, vol. 17, pp.
109–120. Springer, Heidelberg (2009)
34. Weijters, A.J.M.M., Van der Aalst, W.M.P.: Rediscovering Workflow Models from
Event-Based Data using Little Thumb. Integrated Computer-Aided Engineering 10(2),
151–162 (2003)
35. Wen, L., Wang, J., Van der Aalst, W.M.P., Wang, Z., Sun, J.: A Novel Approach for
Process Mining Based on Event Types. BETA Working Paper Series, WP 118. Eindhoven
University of Technology, Eindhoven (2004)