Conference Paper

Optimizing customer journey using process mining and sequence-aware recommendation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Customer journey analysis aims at understanding customer behavior both in the traditional offline setting and through the online website visits. Particularly for the latter, web analytics tools like Google Analytics and customer journey maps have shown their usefulness, by being widely used by web companies. Nevertheless, they provide an oversimplified version of the user behavior in addition to other limitations related to the narrow scope over the cases. This paper contributes a novel approach to overcome these limitations by applying process mining and recommender systems techniques to web log customer journey analysis. Through our novel approach we are able to (i) discover the process that better describes the user behavior, (ii) discover and compare the processes of different behavioral clusters of users, and then (iii) use this analysis to improve the journey by optimizing some KPIs (Key Performance Indicators) via personalized recommendations based on the user behavior. In particular, with process mining it is possible to identify specific customer journey paths that can be enforced to optimize some KPIs. Then, with our novel, sequence-aware recommender system, it is possible to recommend to users particular actions that will optimize the selected KPIs, using the customer journey as an implicit feedback. The proof of the correctness of the introduced concepts is demonstrated through a real-life case study of 10 million events representing the online journeys in 1 month of 2 million users. We show and evaluate the discovered process models from this real web log, then use the extracted information from the process models to select and optimize a KPI via personalized recommendations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... ➢ Using process mining to analyze customer journeys and pinpoint the critical touch points of online platforms [1], [9] for personalization strategy optimization. ➢ Use data-driven marketing strategies to determine the consumer decision process and optimize user experiences [2] [20] [22]. ...
... ➢ Improve the ad auction mechanisms by capturing the behavioral trend for sponsored search, placement targeted ad [17]. ➢ Design dynamic platforms on-the-fly, serving personalized recommendations with sequence-aware models [9] ➢ Use marketing analytics frameworks that integrate behavioral, contextual, and demographic data to personalize the user journey [22] IV.RESEARCH METHODOLOGY The methodology for this study is data-driven and exploratory; we combine user behavior analytics with personalization approaches in an online shopping context. Here we elaborate on the goal to discover and model click streams, dwell time, purchase history as well browsing patterns from user interaction data for more personalized shopping experiences that are perhaps more engaging. ...
... Moreover, we further stratified engagement metrics through taxonomies from related literature [5] and [6], allowing to produce structured analysis of emotional as well as behavioral responses. Instances of this were further incorporated in form of sequence-aware recommendation systems [9] and gratified elements [8] to examine user receptiveness. Historical data was analyzed using machine learning based models for predicting user preferences such that dynamically recommendations from a secondorder order The intricacies of Dynamic recommendation engines [7], [11] serves as a benchmark for other works within this research. ...
Article
Full-text available
The Personalization in online shopping with user behavior data: a prospective study of transformative potential for consumer insights through e-commerce environments E-commerce platform can tap huge user interaction data like click streams, search queries and purchase history and dwell time once stored together it becomes of prime importance for customer's preference and intent as platforms collect loads of data. The platforms use advanced analytics, machine learning and recommendation systems to offer content, product recommendations and the user experience based on individual behavior. Such personalized engagement not only boosts user engagement but also increases conversion rates with uplifted customer loyalty. Linked to drawing data-driven techniques, businesses can then produce increased and accelerated service. The work looks at well-used techniques such as collaborative filtering, process mining and recommendation engines in real-time. It also notes what kind of effect personalization has on metrics like retention, average order value and browsing time. This manuscript examines the plausibility of data privacy, ethical personalization and trade-off between customization and user control. Using real-world case studies, it provides examples of personalization strategies and competitive advantage in practice. In addition, it touches on integration problems between data on devices and platforms. Ultimately, the study postulates a behavior-driven personalization-oriented dynamic framework E-commerce. User Behavior Analytics is pretty much to identify those based enablers that bring about optimization in the data economic ecosystem.
... Their dataset was not from e-commerce but from advertising, which we considered to be a closer area to e-commerce. In both of the papers, recommendations for product pages were made for the user to visit [49,50]. In 2018, researchers used online ticket sales data to create a process model to make predictions and recommendations [51]. ...
... If similar rates of activity-level journeys were observed in the brick-and-mortar shops, we would see that the majority of the customers would enter and exit the store, which might require immediate action from the managers. Hence, instead of ignoring them like in existing studies [9,49,50], we considered activity-level journeys to be an improvement opportunity. By using complementary analysis methods, e-commerce practitioners can benefit from the results: ...
... In addition, an important implication arising from activity-level journeys was that any predictions or recommendations made during the first two events of a session would fail most of the time. Therefore, running models after the third event can result in higher accuracy rates in the reviewed process mining studies in e-commerce [49][50][51] and improve recommendation performance from the e-commerce practitioners' perspective. ...
Article
Full-text available
Understanding customer journeys is key to e-commerce success. Many studies have been conducted to obtain journey maps of e-commerce visitors. To our knowledge, a complete, end-to-end and structured map of e-commerce journeys is still missing. In this research, we proposed a four-step methodology to extract and understand e-commerce visitor journeys using process mining. In order to obtain more structured process diagrams, we used techniques such as activity type enrichment, start and end node identification, and Levenshtein distance-based clustering in this methodology. For the evaluation of the resulting diagrams, we developed a model utilizing expert knowledge. As a result of this empirical study, we identified the most significant factors for process structuredness and their relationships. Using a real-life big dataset which has over 20 million rows, we defined activity-, behavior-, and process-level e-commerce visitor journeys. Exploitation and exploration were the most common journeys, and it was revealed that journeys with exploration behavior had significantly lower conversion rates. At the process level, we mapped the backbones of eight journeys and tested their qualities with the empirical structuredness measure. By using cart statuses at the beginning and end of these journeys, we obtained a high-level end-to-end e-commerce journey that can be used to improve recommendation performance. Additionally, we proposed new metrics to evaluate online user journeys and to benchmark e-commerce journey design success.
... While some research has delved into the effects of CJA on firm performance, the focus remains narrow, often limited to business performance. Previous literature predominantly highlights CJA's correlation with the return on investment (Homburg and Tischer, 2023), KPI (Terragni and Hassani, 2019), and CRM performance (Marino and Lo Presti, 2018). These studies, however, overlook the direct linkage of CJA with crucial facets like business growth, profit, business development, and customer satisfaction -all of which are pivotal drivers of business performance (Roman and Rusu, 2022;Auh et al., 2019). ...
... Despite our findings diverging from prevalent literature emphasising the importance of the customer journey to business performance (Homburg and Tischer, 2023;Marino and Lo Presti, 2018;Terragni and Hassani, 2019), we propose that there might be elements absent or overlooked in the current CJA implementations. For a holistic approach, businesses should earnestly consider integrating the tenets of SPT into their customer journey management. ...
... CJMs represent grouped traces in the event logging, unlike our work where we mine a general model. Existing approaches [26,28,31,43] that use process discovery techniques to mine transition systems ignore the underlying distribution of events. By capturing the probabilities in the model, we can perform a finer analysis and visualization, and provide guidelines to the service provider in case of changing behavior. ...
Chapter
Full-text available
Industry is shifting towards service-based business models, for which user satisfaction is crucial. User satisfaction can be analyzed with user journeys, which model services from the user’s perspective. Today, these models are created manually and lack both formalization and tool-supported analysis. This limits their applicability to complex services with many users. Our goal is to overcome these limitations by automated model generation and formal analyses, enabling the analysis of user journeys for complex services and thousands of users. In this paper, we use stochastic games to model and analyze user journeys. Stochastic games can be automatically constructed from event logs and model checked to, e.g., identify interactions that most effectively help users reach their goal. Since the learned models may get large, we use property-preserving model reduction to visualize users’ pain points to convey information to business stakeholders. The applicability of the proposed method is here demonstrated on two complementary case studies.
... Terragni and Hassani [25] apply process mining to user journey web logs to build process models, and improve the results by clustering journeys. This work has been integrated with a recommender system to suggest service actions that maximize key performance indicators [26], e.g., how often the product page is visited. David et al. present TAPPAAL [27], a tool for analyzing timed-arc Petri nets, realized through mappings to UPPAAL. ...
Article
Full-text available
The servitization of business is moving industry to business models driven by customer demand. Customer satisfaction is connected with financial rewards, forcing companies to invest in their users’ experience. User journeys describe how users maneuver through a service. Today, user journeys are typically modeled graphically, and lack formalization and analysis support. This paper proposes a formalization of user journeys as weighted games between the user and the service provider and a systematic data-driven method to derive these user journey games from system logs, using process mining techniques. As the derived games may contain cycles, we define an algorithm to transform user journeys games with cycles into acyclic weighted games, which can be model checked using to uncover potential challenges in a company’s interactions with its users and derive company strategies to guide users through their journeys. Finally, we propose a user journey sliding-window analysis to detect changes in the user journey over time by model checking a sequence of generated games. Our analysis pipeline has been evaluated on an industrial case study; it revealed design challenges within the studied service and could be used to derive actionable recommendations for improvement.
... Next to the explorative task of process discovery, conformance checking [16] has been introduced, which compares an existing model of process behavior, for example, modeled as the BPMN model 1 , with observed behavior pinpointing differences, for example, for compliance analysis. Process mining has been very successful in driving process improvement projects in industry [17], but also applications in non-typical business processes such as the analysis of user journeys [18][19][20][21] has been reported. ...
Article
Full-text available
Recently, there has been increased awareness about the importance of data derived from actual customer journeys, including the subjective customer experience, in the analysis and evaluation of service quality. In this paper, we explore how customer journey analysis and process mining can be combined to advance the analysis and improvement of services. First, we demonstrate the strengths and weaknesses of both methodologies using a specific case study as an illustrative example. Subsequently, we delve into the synergies and challenges inherent in their combination, deriving practical guidelines. We then suggest avenues for further research questions in this cross-disciplinary approach. The paper underscores the potential of aligning these methodologies to provide a more accurate and complete understanding of service delivery, ultimately contributing to the enhancement of customer experience.
... Bernard et al. [8,10] investigate the possibility of using process mining for user journeys, they use hierarchical clustering and user-defined goals to abstract from a large number of journeys [7], and propose a method to discover user journeys from logs at varying levels of granularity [9]. Terragni and Hassani [44] investigate user journeys in the form of web logs and their optimization by building recommender systems proposing user-specific actions optimizing key performance indicators [45]. In contrast, our work focuses on the modeling aspect of user journeys with active objects and simulations to gain prescriptive insights into the service provider behavior and user journeys. ...
Chapter
Full-text available
The servitization of business makes companies increasingly dependent on providing carefully designed user experiences for their service offerings. User journeys model services from the user’s perspective, but user journeys are today mainly constructed and analyzed manually. Recent work analyzing user journeys as games enable optimal service-provider strategies to be automatically derived, assuming a restricted user behavior. Complementing this work, we here develop an actor-based modeling framework for user journeys that is parametric in user behavior and service-provider strategies, using the active-object modeling language ABS. Strategies for the service provider, such as those derived for user journey games, can be automatically imported into the framework. Our work enables prescriptive simulation-based analyses, as strategies can be evaluated and compared in scenarios with rich user behavior.
... Furthermore, resources need to be allocated to producing unique content in order to increase user engagement (Byun et al., 2020). Big data-powered trigger-based customer journeys help businesses to foresee potential developments and concentrate on delivering exceptional digital experiences (Terragni & Hassani, 2019). Additionally, in contrast to other industries, which can create advertisements seasonally (Zheng et al., 2015;Fitz-Gibbon, 1990), logistic startups require a consistent stream of advertisements in both social media and search engines in order to maintain and develop a digital brand name and customer loyalty. ...
Article
Full-text available
Logistics startups gradually rely on digital marketing strategies to acquire a competitive advantage. The main aim of Logistics startups is to increase their digital brand name and user engagement in order to acquire a competitive advantage. To the completion of this target, various digital marketing strategies could be implemented to ensure a differentiating factor. A three-stage data-driven methodology was adopted to evaluate the contribution between the parameters and to reflect strategies that can be presented to improve the web-site's user engagement and digital brand name. The first part of the study collects data from nine logistics startups' websites over a period of 180 days. The second part of the study employs Fuzzy Cognitive Mapping (FCM) to develop an exploratory diagnostic model that visually depicts the cause-and-effect relationships between the metrics under consideration. In the last part of the study, a predictive simulation model has been created to present the intercorrelation between the examined metrics and to present possible optimization strategies. According to the findings of this study, Logistics startups' websites must be developed with fewer web pages and need to be focused on the customers' target. Additionally, in contradiction with other industries' websites, logistics startups must maintain a steady flow of digital advertisements to optimize brand name and profit.
... Moreover, resources should be allocated in developing consistent, unique, and relevant content to increase customers' readability. Trigger-based customer journeys, powered by big data, enable companies to identify future trends and focus on delivering outstanding digital experiences [123,124]. ...
Article
Full-text available
To acquire competitive differentiation nowadays, logistics businesses must adopt novel strategies. Logistics companies have to consider whether redesigning their marketing plan based on client social media activity and website activity might increase the effectiveness of their digital marketing strategy. Insights from this study will be used to help logistics firms improve the effectiveness of their digital marketing as part of a marketing re-engineering and change management process. An innovative methodology was implemented. Collecting behavioral big data from the logistics companies’ social media and websites was the first step. Next, regression and correlation analyses were conducted, together with the creation of a fuzzy cognitive map simulation in order to produce optimization scenarios. The results revealed that re-engineering marketing strategies and customer behavioral big data can successfully affect important digital marketing performance metrics. Additionally, social media big data can affect change management and re-engineering processes by reducing operational costs and investing more in social media visibility and less in social media interactivity. The following figure presents the graphical presentation of the abstract.
Chapter
The scope of global food technology market was estimated at USD 184.30 billion in 2023, USD 202.62 billion in 2024, and is projected to grow at a compound annual growth rate (CAGR) of 9.79% from 2024 to 2034, reaching approximately USD 515.83 billion. Technology is driving the growth of the food industry in various positive ways such as online food delivery in minutes, quality assessment, customer reviews, reducing hunger, and the like. But together with several advantages it also carries concerns like job displacement, food safety/security issues, regulatory compliance, and sustainability. To overcome these challenges, redesigning the digital food plate is critical in the form of concrete guidelines and regulations. Considering the above perspective, this chapter, adopting the analytical method, examines the role of digital and emerging technologies in shaping the food industry. Furthermore, it critically evaluates the way forward towards sustainability.
Article
Full-text available
In education, e-learning is highly adopted to improve the learning experience and increase learning efficiency and engagement. Yet, an explosion of online learning materials has overwhelmed learners, especially when trying to achieve their learning goals. In this scope, recommender systems are used to guide learners in their learning process by filtering out the available resources to best match their needs, i.e. to offer personalized content and learning paths. Concurrently, process mining has emerged as a valuable tool for comprehending learner behavior during the learning journey. To synergize these disciplines and optimize learning outcomes, our paper introduces an ontology-based framework that aims to recommend an adaptive learning path, driven by a learner’s learning objective, personalized to his learning style, and enriched by the past learning experience of other learners extracted via process mining. The learning path considers pedagogical standards by employing Bloom’s taxonomy within its structure. The framework establishes an Ontological Foundation, to model the Learner, Domain, and Learning Path. Choosing Computer Science as a domain, we construct a knowledge base using synthesized data. For past learning experience, we analyze Moodle log data from 2018 to 2022, encompassing 471 students in the Computer Science and Engineering Department at Frederick University, Cyprus.
Chapter
The customer journey is a marketing concept that describes the path that a customer may take until they purchase a product or service. Therefore, it is pertinent data for organizations, as it allows them to have a better knowledge of customer behavior, this journey can be represented as a map that allows mapping the path of the customer passing through all touchpoints, to present this map, there are several methods, manual by a professional or automatically using algorithms, Another technique for automatically discovering the customer journey map is process mining. This study introduces a new framework based on configurable process mining to find the customer journey map.KeywordsCustomer journey mapConfigurable process miningProcess model
Chapter
Process Mining is a combination of business process management and machine learning, which automatically allows one to discover the process model, compare it with an existing process to verify its conformity, and improve it. With the frequent use of the web and social media, recommender systems are increasingly used to build customer fidelity and smartly simplify access to services. In this paper, we conduct a comparative study between the latest works focusing on how to improve recommender systems using process mining. This study is considered the first step toward developing a new framework based on configurable process mining.KeywordsProcess MiningRecommender SystemCustomer JourneyWeblog
Chapter
Full-text available
The inductive miner (IM) can guarantee to return structured process models, but the process behaviours that process trees can represent are limited. Loops in process trees can only be exited after the execution of the “body” part. However, in some cases, it is possible to break a loop structure in the “redo” part. This paper proposes an extension to the process tree notation and the IM to discover and represent break behaviours. We present a case study using a healthcare event log to explore Acute Coronary Syndrome (ACS) patients’ treatment pathways, especially discharge behaviours from ICU, to demonstrate the usability of the proposed approach in real-life. We find that treatment pathways in ICU are routine behaviour, while discharges from ICU are break behaviours. The results show that we can successfully discover break behaviours and obtain the structured and understandable process model with satisfactory fitness, precision and simplicity.
Conference Paper
Full-text available
Event logs capture information about executed activities. However, they do not capture information about activities that could have been performed, i.e., activities that were enabled during a process. Event logs containing information on enabled activities are called translucent event logs. Although it is possible to extract translucent event logs from a running information system, such logs are rarely stored. To increase the availability of translucent event logs, we propose two techniques. The first technique records the system’s states as snapshots. These snapshots are stored and linked to events. A user labels patterns that describe parts of the system’s state. By matching patterns with snapshots, we can add information about enabled activities. We apply our technique in a small setting to demonstrate its applicability. The second technique uses a process model to add information concerning enabled activities to an existing traditional event log. Data containing enabled activities are valuable for process discovery. Using the information on enabled activities, we can discover more correct models.
Chapter
Full-text available
IoT devices supporting business processes (BPs) in sectors like manufacturing, logistics or healthcare collect data on the execution of the processes. In the last years, there has been a growing awareness of the opportunity to use the data these devices generate for process mining (PM) by deriving an event log from a sensor log via event abstraction techniques. However, IoT data are often affected by data quality issues (e.g., noise, outliers) which, if not addressed at the preprocessing stage, will be amplified by event abstraction and result in quality issues in the event log (e.g., incorrect events), greatly hampering PM results. In this paper, we review the literature on PM with IoT data to find the most frequent data quality issues mentioned in the literature. Based on this, we then derive six patterns of poor sensor data quality that cause event log quality issues and propose solutions to avoid or solve them.
Chapter
Full-text available
Human behavior could be represented in the form of a process. Existing process modeling notations, however, are not able to faithfully represent these very flexible and unstructured processes. Additional non-process aware perspectives should be considered in the representation. Control-flow and data dimensions should be combined to build a robust model which can be used for analysis purposes. The work in this paper proposes a new hybrid model in which these dimensions are combined. An enriched conformance checking approach is described, based on the alignment of imperative and declarative process models, which also supports data dimensions from a statistical viewpoint.
Chapter
Full-text available
Object-centric event log is a format for properly organizing information from different views of a business process into an event log. The novelty in such a format is the association of events with objects, which allows different notions of cases to be analyzed. The addition of new features has brought an increase in complexity. Clustering analysis can ease this complexity by enabling the analysis to be guided by process behaviour profiles. However, identifying which features describe the singularity of each profile is a challenge. In this paper, we present an exploratory study in which we mine frequent patterns on top of clustering analysis as a mechanism for profile characterization. In our study, clustering analysis is applied in a trace clustering fashion over a vector representation for a flattened event log extracted from an object-centric event log, using a unique case notion. Then, frequent patterns are discovered in the event sublogs associated with clusters and organized according to that original object-centric event log. The results obtained in preliminary experiments show association rules reveal more evident behaviours in certain profiles. Despite the process underlying each cluster may contain the same elements (activities and transitions), the behaviour trends show the relationships between such elements are supposed to be different. The observations depicted in our analysis make room to search for subtler knowledge about the business process under scrutiny.
Chapter
Full-text available
Process mining is a family of techniques that support the analysis of operational processes based on event logs. Among the existing event log formats, the IEEE standard eXtensible Event Stream () is the most widely adopted. In , each event must be related to a single case object, which may lead to convergence and divergence problems. To solve such issues, object-centric approaches become promising, where objects are the central notion and one event may refer to multiple objects. In particular, the Object-Centric Event Logs () standard has been proposed recently. However, the crucial problem of extracting logs from external sources is still largely unexplored. In this paper, we try to fill this gap by leveraging the Virtual Knowledge Graph () approach to access data in relational databases. We have implemented this approach in the system, extending it to support both and standards. We have carried out an experiment with over the Dolibarr system. The evaluation results confirm that can effectively extract logs and the performance is scalable.
Chapter
Full-text available
Predictive process monitoring techniques leverage machine learning (ML) to predict future characteristics of a case, such as the process outcome or the remaining run time. Available techniques employ various models and different types of input data to produce accurate predictions. However, from a practical perspective, explainability is another important requirement besides accuracy since predictive process monitoring techniques frequently support decision-making in critical domains. Techniques from the area of explainable artificial intelligence (XAI) aim to provide this capability and create transparency and interpretability for black-box ML models. While several explainable predictive process monitoring techniques exist, none of them leverages textual data. This is surprising since textual data can provide a rich context to a process that numerical features cannot capture. Recognizing this, we use this paper to investigate how the combination of textual and non-textual data can be used for explainable predictive process monitoring and analyze how the incorporation of textual data affects both the predictions and the explainability. Our experiments show that using textual data requires more computation time but can lead to a notable improvement in prediction quality with comparable results for explainability.
Chapter
Full-text available
Aggregation of event data is a key operation in process mining for revealing behavioral features of processes for analysis. It has primarily been studied over sequences of events in event logs. The data model of event knowledge graphs enables new analysis questions requiring new forms of aggregation. We focus on analyzing task executions in event knowledge graphs. We show that existing aggregation operations are inadequate and propose new aggregation operations, formulated as query operators over labeled property graphs. We show on the BPIC’17 dataset that the new aggregation operations allow gaining new insights into differences in task executions, actor behavior, and work division.
Chapter
Full-text available
During the last years, a number of studies have experimented with applying process mining (PM) techniques to smart spaces data. The general goal has been to automatically model human routines as if they were business processes. However, applying process-oriented techniques to smart spaces data comes with its own set of challenges. This paper surveys existing approaches that apply PM to smart spaces and analyses how they deal with the following challenges identified in the literature: choosing a modelling formalism for human behaviour; bridging the abstraction gap between sensor and event logs; and segmenting logs in traces. The added value of this article lies in providing the research community with a common ground for some important challenges that exist in this field and their respective solutions, and to assist further research efforts by outlining opportunities for future work.
Chapter
Full-text available
Public event logs are valuable for process mining research to evaluate process mining artifacts and identify new and promising research directions. Initiatives like the BPI Challenges have provided a series of real-world event logs, including healthcare processes, and have significantly stimulated process mining research. However, the healthcare related logs provide only excerpts of patient visits in hospitals. The Medical Information Mart for Intensive Care (MIMIC)-IV database is a public available relational database that includes data on patient treatment in a tertiary academic medical center in Boston, USA. It provides complex care processes in a hospital from end-to-end. To facilitate the use of MIMIC-IV in process mining and to increase the reproducibility of research with MIMIC, this paper provides a framework consisting of a method, an event hierarchy, and a log extraction tool for extracting useful event logs from the MIMIC-IV database. We demonstrate the framework on a heart failure treatment process, show how logs on different abstraction levels can be generated, and provide configuration files to generate event logs of previous process mining works with MIMIC.
Chapter
Full-text available
Anomaly detection can identify deviations in event logs and allows businesses to infer inconsistencies, bottlenecks, and optimization opportunities in their business processes. In recent years, various anomaly detection algorithms for business processes have been proposed based on either process discovery or machine learning algorithms. While there are apparent differences between machine learning and process discovery approaches, it is often unclear how they perform in comparison. Furthermore, deep learning research in other domains has shown that advancements did not solely come from improved model architecture but were often due to minor pre-processing and training procedure refinements. For this reason, this paper aims to set up a broad benchmark and establish a baseline for deep learning-based anomaly detection of process instances. To this end, we introduce a simple LSTM-based anomaly detector utilizing a collection of minor refinements and compare it with existing approaches. The results suggest that the proposed method can significantly outperform the existing approaches on a large number of event logs consistently.
Chapter
Full-text available
When multiple objects are involved in a process, there is an opportunity for processes to be discovered from different angles with new information that previously might not have been analyzed from a single object point of view. This does require that all the information of event/object attributes and their values are stored within logs including attributes that have a list of values or attributes with values that change over time. It also requires that attributes can unambiguously be linked to an object, an event or both. As such, object-centric event logs are an interesting development in process mining as they support the presence of multiple types of objects. First, this paper shows that the current object-centric event log formats do not support the aforementioned aspects to their full potential since the possibility to support dynamic object attributes (attributes with changing values) is not supported by existing formats. Next, this paper introduces a novel enriched object-centric event log format tackling the aforementioned issues alongside an algorithm that automatically translates XES logs to this Data-aware OCEL (DOCEL) format.
Chapter
Full-text available
Process discovery is a family of techniques that helps to comprehend processes from their data footprints. Yet, as processes change over time so should their corresponding models, and failure to do so will lead to models that under- or over-approximate behaviour. We present a discovery algorithm that extracts declarative processes as Dynamic Condition Response (DCR) graphs from event streams. Streams are monitored to generate temporal representations of the process, later processed to create declarative models. We validated the technique by identifying drifts in a publicly available dataset of event streams. The metrics extend the Jaccard similarity measure to account for process change in a declarative setting. The technique and the data used for testing are available online.
Chapter
Full-text available
Organizations increasingly use process mining techniques to gain insight into their processes. Process mining techniques can be used to monitor and/or enhance processes. However, the impact of processes on the people involved, in terms of unfair discrimination, has not been studied. Another neglected area is the impact of applying process mining techniques on the fairness of processes. In this paper, we overview and categorize the existing fairness concepts in machine learning. Moreover, we summarize the areas where fairness is relevant to process mining and provide an approach to applying existing fairness definitions in process mining. Finally, we present some of the fairness-related challenges in processes.
Chapter
Full-text available
Analysing the treatment pathways in real-world health data can provide valuable insight for clinicians and decision-makers. However, the procedures for acquiring real-world data for research can be restrictive, time-consuming and risks disclosing identifiable information. Synthetic data might enable representative analysis without direct access to sensitive data. In the first part of our paper, we propose an approach for grading synthetic data for process analysis based on its fidelity to relationships found in real-world data. In the second part, we apply our grading approach by assessing cancer patient pathways in a synthetic healthcare dataset (The Simulacrum provided by the English National Cancer Registration and Analysis Service) using process mining. Visualisations of the patient pathways within the synthetic data appear plausible, showing relationships between events confirmed in the underlying non-synthetic data. Data quality issues are also present within the synthetic data which reflect real-world problems and artefacts from the synthetic dataset’s creation. Process mining of synthetic data in healthcare is an emerging field with novel challenges. We conclude that researchers should be aware of the risks when extrapolating results produced from research on synthetic data to real-world scenarios and assess findings with analysts who are able to view the underlying data.
Chapter
Full-text available
A lot of recent literature on outcome-oriented predictive process monitoring focuses on using models from machine and deep learning. In this literature, it is assumed the outcome labels of the historical cases are all known. However, in some cases, the labelling of cases is incomplete or inaccurate. For instance, you might only observe negative customer feedback, fraudulent cases might remain unnoticed. These cases are typically present in the so-called positive and unlabelled (PU) setting, where your data set consists of a couple of positively labelled examples and examples which do not have a positive label, but might still be examples of a positive outcome. In this work, we show, using a selection of event logs from the literature, the negative impact of mislabelling cases as negative, more specifically when using XGBoost and LSTM neural networks. Furthermore, we show promising results on real-life datasets mitigating this effect, by changing the loss function used by a set of models during training to those of unbiased Positive-Unlabelled (uPU) or non-negative Positive-Unlabelled (nnPU) learning.
Chapter
Full-text available
Assigning resources in business processes execution is a repetitive task that can be effectively automated. However, different automation methods may give varying results that may not be optimal. Proper resource allocation is crucial as it may lead to significant cost reductions or increased effectiveness that results in increased revenues. In this work, we first propose a novel representation that allows the modeling of a multi-process environment with different process-based rewards. These processes can share resources that differ in their eligibility. Then, we use double deep reinforcement learning to look for an optimal resource allocation policy. We compare those results with two popular strategies that are widely used in the industry. Learning optimal policy through reinforcement learning requires frequent interactions with the environment, so we also designed and developed a simulation engine that can mimic real-world processes. The results obtained are promising. Deep reinforcement learning based resource allocation achieved significantly better results compared to two commonly used techniques.
Chapter
Full-text available
Customer journey analysis is important for organizations to get to know as much as possible about the main behavior of their customers. This provides the basis to improve the customer experience within their organization. This paper addresses the problem of predicting the occurrence of a certain activity of interest in the remainder of the customer journey that follows the occurrence of another specific activity. For this, we propose the HIAP framework which uses process mining techniques to analyze customer journeys. Different prediction models are researched to investigate which model is most suitable for high importance activity prediction. Furthermore the effect of using a sliding window or landmark model for (re)training a model is investigated. The framework is evaluated using a health insurance real dataset and a benchmark data set. The efficiency and prediction quality results highlight the usefulness of the framework under various realistic online business settings.
Chapter
Full-text available
In recent years, AutoML has emerged as a promising technique for reducing computational and time cost by automating the development of machine learning models. Existing AutoML tools cannot be applied directly to process predictive monitoring (PPM), because they do not support several configuration parameters that are PPM-specific, such as trace bucketing or encoding. In other words, they are only specialized in finding the best configuration of machine learning model hyperparameters. In this paper, we present a simple yet extensible framework for AutoML in PPM. The framework uses genetic algorithms to explore a configuration space containing both PPM-specific parameters and the traditional machine learning model hyperparameters. We design four different types of experiments to verify the effectiveness of the proposed approach, comparing its performance in respect of random search of the configuration space, using two publicly available event logs. The results demonstrate that the proposed approach outperforms consistently the random search.
Chapter
Full-text available
In this paper, we introduce the SAP Signavio Academic Models (SAP-SAM) dataset, a collection of hundreds of thousands of business models, mainly process models in BPMN notation. The model collection is a subset of the models that were created over the course of roughly a decade on academic.signavio.com , a free-of-charge software-as-a-service platform that researchers, teachers, and students can use to create business (process) models. We provide a preliminary analysis of the model collection, as well as recommendations on how to work with it. In addition, we discuss potential use cases and limitations of the model collection from academic and industry perspectives.
Chapter
Full-text available
We present a method and prototype tool supporting participatory mapping of domain activities to event data recorded in information systems via the system interfaces. The aim is to facilitate responsible secondary use of event data recorded in information systems, such as process mining and the construction of predictive AI models. Another identified possible benefit is the support for increasing the quality of data by using the mapping to support educating new users in how to register data, thereby increasing the consistency in how domain activities are recorded. We illustrate the method on two cases, one from a job center in a danish municipality and another from a danish hospital using the healthcare platform from Epic.
Chapter
Full-text available
This paper presents an approach of using methods of process mining and rule-based artificial intelligence to analyze and understand study paths of students based on campus management system data and study program models. Process mining techniques are used to characterize successful study paths, as well as to detect and visualize deviations from expected plans. These insights are combined with recommendations and requirements of the corresponding study programs extracted from examination regulations. Here, event calculus and answer set programming are used to provide models of the study programs which support planning and conformance checking while providing feedback on possible study plan violations. In its combination, process mining and rule-based artificial intelligence are used to support study planning and monitoring by deriving rules and recommendations for guiding students to more suitable study paths with higher success rates. Two applications will be implemented, one for students and one for study program designers.
Chapter
Full-text available
Constraint monitoring aims to monitor the violation of constraints in business processes, e.g., an invoice should be cleared within 48 h after the corresponding goods receipt, by analyzing event data. Existing techniques for constraint monitoring assume that a single case notion exists in a business process, e.g., a patient in a healthcare process, and each event is associated with the case notion. However, in reality, business processes are object-centric , i.e., multiple case notions (objects) exist, and an event may be associated with multiple objects. For instance, an Order-To-Cash (O2C) process involves order , item , delivery , etc., and they interact when executing an event, e.g., packing multiple items together for a delivery. The existing techniques produce misleading insights when applied to such object-centric business processes. In this work, we propose an approach to monitoring constraints in object-centric business processes. To this end, we introduce Object-Centric Constraint Graphs (OCCGs) to represent constraints that consider the interaction of objects. Next, we evaluate the constraints represented by OCCGs by analyzing Object-Centric Event Logs (OCELs) that store the interaction of different objects in events. We have implemented a web application to support the proposed approach and conducted two case studies using a real-life SAP ERP system.
Chapter
Full-text available
Computer-based education relies on information systems to support teaching and learning processes. These systems store trace data about the interaction of the learners with their different functionalities. Process mining techniques have been used to evaluate these traces and provide insights to instructors on the behavior of students. However, an analysis of students behavior on solving open-questioned examinations combined with the marks they received is still missing. This analysis can support the instructors not only on improving the design of future edition of the course, but also on improving the structure of online and physical evaluations. In this paper, we use process mining techniques to evaluate the behavioral patterns of students solving computer-based open-ended exams and their correlation with the grades. Our results show patterns of behavior associated to the marks received. We discuss how these results may support the instructor on elaborating future open question examinations.
Chapter
Full-text available
In recent years, hospitals and other care providers in the Netherlands are coping with a widespread nursing shortage and a directly related increase in nursing workload. This nursing shortage combined with the high nursing workload is associated with higher levels of burnout and reduced job satisfaction among nurses. However, not only the nurses, but also the patients are affected as an increasing nursing workload adversely affects patient safety and satisfaction. Therefore, the aim of this research is to predict the care acuity corresponding to an individual patient for the next admission day, by using the available structured hospital data of the previous admission days. For this purpose, we make use of an LSTM model that is able to predict the care acuity of the next day, based on the hospital data of all previous days of an admission. In this paper, we elaborate on the architecture of the LSTM model and we show that the prediction accuracy of the LSTM model increases with the increase of the available amount of historical event data. We also show that the model is able to identify care acuity differences in terms of the amount of support needed by the patient. Moreover, we discuss how the predictions can be used to identify which patient care related characteristics and different types of nursing activities potentially contribute to the care acuity of a patient.
Chapter
Full-text available
Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospital’s new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted.
Conference Paper
Full-text available
The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: developing a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and analyzing the adherence of the observed behavior (recorded in the information system of the hospital) to such guidelines. We show that, through conformance checking techniques, it is possible to analyze the care process for COVID-19 patients, highlighting the main deviations from the clinical guidelines. The results provide physicians with useful indications for improving the process and ensuring service quality and patient satisfaction. We share the resulting model as an open-source BPMN file.
Chapter
Full-text available
Conformance checking is a process mining technique that allows verifying the conformance of process instances to a given model. Many conformance checking algorithms provide quantitative information about the conformance of a process instance through metrics such as fitness. Fitness measures to what degree the model allows the behavior observed in the event log. Conventional fitness does not consider the individual severity of deviations. In cases where there are rules that are more important to comply with than others, fitness consequently does not take all factors into account. In the field of medicine, for example, there are guideline recommendations for clinical treatment that have information about their importance and soundness, making it essential to distinguish between them. Therefore, we introduce an alignment-based conformance checking approach that considers the importance of individual specifications and weights violations. The approach is evaluated with real patient data and evidence-based guideline recommendations. Using this approach, it was possible to integrate guideline recommendation metadata into the conformance checking process and to weight violations individually.
Chapter
Full-text available
In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical limitations such as high computational complexity, introducing fake variants, removing frequent variants, and a bounded variant length. In this paper, we introduce a new approach for direct differentially private trace variant release which uses anonymized partition selection strategies to overcome the aforementioned restraints. Experimental results on real-life event data show that our algorithm outperforms state-of-the-art methods in terms of both plain data utility and result utility preservation.
Chapter
Full-text available
Process mining is a set of techniques that are used by organizations to understand and improve their operational processes. The first essential step in designing any process reengineering procedure is to find process improvement opportunities. In existing work, it is usually assumed that the set of problematic process instances in which an undesirable outcome occurs is known prior or is easily detectable. So the process enhancement procedure involves finding the root causes and the treatments for the problem in those process instances. For example, the set of problematic instances is considered as those with outlier values or with values smaller/bigger than a given threshold in one of the process features. However, on various occasions, using this approach, many process enhancement opportunities, not captured by these problematic process instances, are missed. To overcome this issue, we formulate finding the process enhancement areas as a context-sensitive anomaly/outlier detection problem. We define a process enhancement area as a set of situations (process instances or prefixes of process instances) where the process performance is surprising. We aim to characterize those situations where process performance is significantly different from what was expected considering its performance in similar situations. To evaluate the validity and relevance of the proposed approach, we have implemented and evaluated it on a real-life event log.
Chapter
Full-text available
To improve the user experience, service providers may systematically record and analyse user interactions with a service using event logs. User journeys model these interactions from the user’s perspective. They can be understood as event logs created by two independent parties, the user and the service provider, both controlling their share of actions. We propose multi-party event logs as an extension of event logs with information on the parties, allowing user journeys to be analysed as weighted games between two players. To reduce the size of games for complex user journeys, we identify decision boundaries at which the outcome of the game is determined. Decision boundaries identify subgames that are equivalent to the full game with respect to the final outcome of user journeys. The decision boundary analysis from multi-party event logs has been implemented and evaluated on the BPI Challenge 2017 event log with promising results, and can be connected to existing process mining pipelines.
Chapter
Full-text available
Data and process mining techniques can be applied in many areas to gain valuable insights. For many reasons, accessibility to real-world business and medical data is severely limited. However, research, but especially the development of new methods, depends on a sufficient basis of realistic data. Due to the lack of data, this progress is hindered. This applies in particular to domains that use personal data, such as healthcare. With adequate quality, synthetic data can be a solution to this problem. In the procedural field, some approaches have already been presented that generate synthetic data based on a process model. However, only a few have included the data perspective so far. Data semantics, which is crucial for the quality of the generated data, has not yet been considered. Therefore, in this paper we present the multi-perspective event log generation approach SAMPLE that considers the data perspective and, in particular, its semantics. The evaluation of the approach is based on a process model for the treatment of malignant melanoma. As a result, we were able to integrate the semantic of data into the log generation process and identify new challenges.
Article
Predictive Process Analytics is becoming an essential aid for organizations, providing online operational support of their processes. However, process stakeholders need to be provided with an explanation of the reasons why a given process execution is predicted to behave in a certain way. Otherwise, they will be unlikely to trust the predictive monitoring technology and, hence, adopt it. This paper proposes a predictive analytics framework that is also equipped with explanation capabilities based on the game theory of Shapley Values. The framework has been implemented in the IBM Process Mining suite and commercialized for business users. The framework has been tested on real-life event data to assess the quality of the predictions and the corresponding evaluations. In particular, a user evaluation has been performed in order to understand if the explanations provided by the system were intelligible to process stakeholders.
Article
Advertisers apply customer journey analyses to gain insights into customer behavior at various touchpoints and measure their advertising impact using attribution models. Along the customer journey, customers constantly adjust goals and expectations, generating new data that can influence attribution results. However, extant attribution models do not capture changes in customer behavior over time, which limits their results’ meaningfulness. In response, the authors present a dynamic approach to customer journey analysis based on Markov chains that updates attribution results on a rolling basis by sequentially considering new data. Applying this approach to a nine-year data set of 45,694 customer journeys leads to empirical generalizations and channel-specific insights. A model comparison shows that rolling determination increases the attribution accuracy and extends previous research. Furthermore, data collection periods influence advertising impact and should be considered in attribution modeling. Thus, the study enables advertisers to interactively manage their marketing strategy and improve budget allocation.
Conference Paper
Full-text available
Customer journey maps (CJMs) are used to understand customers' behavior, and ultimately to better serve them. This new approach is used in numerous disciplines for different purposes. As a response, several software applications have emerged. Although they provide interfaces to understand CJMs, they lack measures to assist in decision making. We contribute by proposing a CJM model. We show its potential by using it with process mining, a data analytics technique that we leverage to assess the impact of the journey's duration on the customer experience. The model brings data scientists and customer journey planners closer together, the first step in gaining a better understanding of customer behavior. This study also highlights the prospective value of process mining for CJM analysis.
Conference Paper
Full-text available
Given an event log describing observed behaviour, process discovery aims to find a process model that ‘best’ describes this behaviour. A large variety of process discovery algorithms has been proposed. However, no existing algorithm returns a sound model in all cases (free of deadlocks and other anomalies), handles infrequent behaviour well and finishes quickly. We present a technique able to cope with infrequent behaviour and large event logs, while ensuring soundness. The technique has been implemented in ProM and we compare the technique with existing approaches in terms of quality and performance.
Article
Full-text available
We present a new weighted session similarity measure to capture the browsing interests of users in web usage profiles discovered from web log data. We base our similarity measure on the reasonable assumption that when users spend longer times on pages or revisit pages in the same session, then very likely, such pages are of greater interest to the user. The proposed similarity measure combines structural similarity with sessionwise page significance. The latter, representing the degree of user interest, is computed using frequency and duration of a page access. Web usage profiles are generated using this similarity measure by applying a fuzzy clustering algorithm to web log data. For evaluating the effectiveness of the proposed measure, we adapt two model-based collaborative filtering algorithms for recommending pages. Experimental results show considerable improvement in overall performance of recommender systems as compared to use of other existing similarity measures. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved.
Conference Paper
Full-text available
The dynamic nature of the Web and its increasing importance as an economic platform create the need of new methods and tools for business efficiency. Current Web analytic tools do not provide the necessary abstracted view of the underlying customer processes and critical paths of site visitor behavior. Such information can offer insights for businesses to react effectively and efficiently. We propose applying Business Process Management (BPM) methodologies to e-commerce Website logs, and present the challenges, results and potential benefits of such an approach. We use the Business Process Insight (BPI) platform, a collaborative process intelligence toolset that implements the discovery of loosely-coupled processes, and includes novel process mining techniques suitable for the Web. Experiments are performed on custom click-stream logs from a large online travel and booking agency. We first compare Web clicks and BPM events, and then present a methodology to classify and transform URLs into events. We evaluate traditional and custom process mining algorithms to extract business models from real-life Web data. The resulting models present an abstracted view of the relation between pages, exit points, and critical paths taken by customers. Such models show important improvements and aid high-level decision making and optimization of e-commerce sites compared to current state-of-art Web analytics.
Conference Paper
Full-text available
Process Mining is a technique for extracting process models from ex- ecution logs. This is particularly useful in situations where people have an ide- alized view of reality. Real-life processes turn out to be less structured than peo- ple tend to believe. Unfortunately, traditional process mining approaches have problems dealing with unstructured processes. The discovered models are often "spaghetti-like", showing all details without distinguishing what is important and what is not. This paper proposes a new process mining approach to overcome this problem. The approach is configurable and allows for different faithfully simpli- fied views of a particular process. To do this, the concept of a roadmap is used as a metaphor. Just like different roadmaps provide suitable abstractions of reality, process models should provide meaningful abstractions of operational processes encountered in domains ranging from healthcare and logistics to web services and public administration.
Conference Paper
Full-text available
In today's fast changing business environment flexible Process Aware Information Systems (PAISs) are required to allow companies to rapidly adjust their business processes to changes in the environment. However, increasing flexibility in large PAISs usually leads to less guidance for its users and consequently requires more experienced users. To allow for flexible systems with a high degree of support, intelligent user assistance is required. In this paper we propose a recommendation service, which, when used in combination with flexible PAISs, can support end users during process execution by giving recommendations on possible next steps. Recommendations are generated based on similar past process executions by considering the specific optimization goals. In this paper we also evaluate the proposed recommendation service, by means of experiments.
Conference Paper
Full-text available
A common task of recommender systems is to improve customer experience through personalized recommenda- tions based on prior implicit feedback. These systems pas- sively track different sorts of user behavior, such as pur- chase history, watching habits and browsing activity, in or- der to model user preferences. Unlike the much more ex- tensively researched explicit feedback, we do not have any direct input from the users regarding their preferences. In particular, we lack substantial evidence on which products consumer dislike. In this work we identify unique proper- ties of implicit feedback datasets. We propose treating the data as indication of positive and negative preference asso- ciated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feed- back recommenders. We also suggest a scalable optimiza- tion procedure, which scales linearly with the data size. The algorithmis used successfully within a recommender system for television shows. It compares favorably with well tuned implementations of other known methods. In addition, we offer a novel way to give explanations to recommendations given by this factor model.
Conference Paper
Full-text available
Process mining has proven to be a valuable tool for analyzing operational process executions based on event logs. Existing techniques perform well on structured processes, but still have problems discovering and visualizing less structured ones. Unfortunately, process mining is most interesting in domains requiring exibilit y. A typical example would be the treatment process in a hospital where it is vital that people can deviate to deal with changing circumstances. Here it is useful to provide insights into the actual processes but at the same time there is a lot of diversity leading to complex models that are dicult to interpret. This paper presents an approach using trace clustering, i.e., the event log is split into homogeneous subsets and for each subset a process model is created. We demonstrate that our approach, based on log proles, can improve process mining results in real exible environments. To illustrate this we present a real-life case study.
Article
Full-text available
Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.
Conference Paper
Full-text available
Recommender systems apply knowledge discovery techniques to the problem of making personalized recom- mendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative filtering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Item- based techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vec- tors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available user-based algorithms.
Article
Full-text available
Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and, typically, there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we have developed techniques for discovering workflow models. The starting point for such techniques is a so-called "workflow log" containing information about the workflow process as it is actually being executed. We present a new algorithm to extract a process model from such a log and represent it in terms of a Petri net. However, we also demonstrate that it is not possible to discover arbitrary workflow processes. We explore a class of workflow processes that can be discovered. We show that the α-algorithm can successfully mine any workflow represented by a so-called SWF-net.
Article
Full-text available
Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative filtering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Itembased techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time p...
Chapter
The majority of clustering approaches focused on static data. However, a big variety of recent applications and research issues in big data mining require dealing with continuous, possibly infinite streams of data, arriving at high velocity. Web traffic data, surveillance data, sensor measurements, and stock trading are only some examples of these daily-increasing applications. Additionally, as the growth of data volumes is accompanied by a similar expansion in their dimensionalities, clusters cannot be expected to completely appear when considering all attributes together. Subspace clustering is a general approach that solved that issue by automatically finding the hidden clusters within different subsets of the attributes rather than considering all attributes together. In this chapter, novel methods for an efficient subspace clustering of high-dimensional big data streams are presented. Approaches that efficiently combine the anytime clustering concept with the stream subspace clustering paradigm are discussed. Additionally, efficient and adaptive density-based clustering algorithms are presented for high-dimensional data streams. Novel open-source assessment framework and evaluation measures are additionally presented for subspace stream clustering.
Conference Paper
The need to support advanced analytics on Big Data is driving data scientist' interest toward massively parallel distributed systems and software platforms, such as Map-Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. By using four stages of successive refinements, CLUBS+ delivers high-quality clusters of data grouped around their centroids, working in a totally unsupervised fashion. Experimental results confirm the accuracy and scalability of CLUBS+.
Chapter
We present, first of all, a broader perspective on the essence of problem solving and decision-making in complex real-world environments. Then, we address a special role of decision support, and decision support systems, as those solutions are generally considered the most promising for solving all kinds of nontrivial problems in our context. Finally, we analyze a vital need for tools and techniques that could involve elements of creativity in problem solving and decision-making, and systems for the support of them. We advocate a need for grasping creativity from many points of view, starting from its role to solve problems in the even more complex present world, and its role as the only means that can yield an added value and hence help attain innovativeness and competitiveness. After those general remarks we present a critical overview of the papers in this volume, peer reviewed and carefully selected, subdivided into six topical parts, and remarks on the scope and an outline of the 8th International Conference on Knowledge, Information and Creativity Support Systems (KICSS’2013) held in Krakόw and Wieliczka, Poland, in November 2013, in the context of the historical development of this conference series and an increased interest of a large community of researchers and scholars, but also practitioners that have been decisive for the development of this conference series. The contents and main contributions of accepted papers are briefly outlined in the order as they appear in this volume. Some general remarks and acknowledgements are also included.
Chapter
The increasing importance of the Web as a medium for electronic and business transactions has served as a driving force for the development of recommender systems technology. An important catalyst in this regard is the ease with which the Web enables users to provide feedback about their likes or dislikes. For example, consider a scenario of a content provider such as Netflix. In such cases, users are able to easily provide feedback with a simple click of a mouse. A typical methodology to provide feedback is in the form of ratings, in which users select numerical values from a specific evaluation system (e.g., five-star rating system) that specify their likes and dislikes of various items.
Book
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.
Article
Predicting business process behaviour, such as the final state of a running process, the remaining time to completion or the next activity of a running process, is an important aspect of business process management. Motivated by research in natural language processing, this paper describes an application of deep learning with recurrent neural networks to the problem of predicting the next event in a business process. This is both a novel method in process prediction, which has largely relied on explicit process models, and also a novel application of deep learning methods. The approach is evaluated on two real datasets and our results surpass the state-of-the-art in prediction precision. The paper offers recommendations for researchers and practitioners and points out areas for future applications of deep learning in business process management.
Book
This book comprehensively covers the topic of recommender systems, which provide personalized recommendations of products or services to users based on their previous searches or purchases. Recommender system methods have been adapted to diverse applications including query log mining, social networking, news recommendations, and computational advertising. This book synthesizes both fundamental and advanced topics of a research area that has now reached maturity. The chapters of this book are organized into three categories: - Algorithms and evaluation: These chapters discuss the fundamental algorithms in recommender systems, including collaborative filtering methods, content-based methods, knowledge-based methods, ensemble-based methods, and evaluation. - Recommendations in specific domains and contexts: the context of a recommendation can be viewed as important side information that affects the recommendation goals. Different types of context such as temporal data, spatial data, social data, tagging data, and trustworthiness are explored. - Advanced topics and applications: Various robustness aspects of recommender systems, such as shilling systems, attack models, and their defenses are discussed. In addition, recent topics, such as learning to rank, multi-armed bandits, group systems, multi-criteria systems, and active learning systems, are introduced together with applications. Although this book primarily serves as a textbook, it will also appeal to industrial practitioners and researchers due to its focus on applications and references. Numerous examples and exercises have been provided, and a solution manual is available for instructors. About the Author: Charu C. Aggarwal is a Distinguished Research Staff Member (DRSM) at the IBM T.J. Watson Research Center in Yorktown Heights, New York. He completed his B.S. from IIT Kanpur in 1993 and his Ph.D. from the Massachusetts Institute of Technology in 1996. He has published more than 300 papers in refereed conferences and journals, and has applied for or been granted more than 80 patents. He is author or editor of 15 books, including a textbook on data mining and a comprehensive book on outlier analysis. Because of the commercial value of his patents, he has thrice been designated a Master Inventor at IBM. He has received several internal and external awards, including the EDBT Test-of-Time Award (2014) and the IEEE ICDM Research Contributions Award (2015). He has also served as program or general chair of many major conferences in data mining. He is a fellow of the SIAM, ACM, and the IEEE, for “contributions to knowledge discovery and data mining algorithms.”
Book
From the Preface: "This volume contains some carefully selected papers presented at the 8th International Conference on Knowledge, Information and Creativity Support Systems KICCS’2013, which was held in Kraków and Wieliczka, Poland in November 2013. In most cases the papers are extended versions with newer results added, representing virtually all topics covered by the conference. The KICCS’2013 focus theme, “Looking into the Future of Creativity and Decision Support Systems”, clearly indicates that the growing complexity calls for some deeper and insightful discussions about the future but, obviously, complemented with an exposition of modern present developments that have proven their power and usefulness. Following this theme, the list of topics presented in this volume include some future-oriented fields of research, such as anticipatory networks and systems, foresight support systems, relevant newly-emerging applications, exemplified by autonomous creative systems. Special attention was also given to cognitive and collaborative aspects of creativity." The book is available from: http://www.springer.com/gp/book/9783319190891
Chapter
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.
Book
This second edition of a well-received text, with 20 new chapters, presents a coherent and unified repository of recommender systems’ major concepts, theories, methodologies, trends, and challenges. A variety of real-world applications and detailed case studies are included. In addition to wholesale revision of the existing chapters, this edition includes new topics including: decision making and recommender systems, reciprocal recommender systems, recommender systems in social networks, mobile recommender systems, explanations for recommender systems, music recommender systems, cross-domain recommendations, privacy in recommender systems, and semantic-based recommender systems. This multi-disciplinary handbook involves world-wide experts from diverse fields such as artificial intelligence, human-computer interaction, information retrieval, data mining, mathematics, statistics, adaptive user interfaces, decision support systems, psychology, marketing, and consumer behavior. Theoreticians and practitioners from these fields will find this reference to be an invaluable source of ideas, methods and techniques for developing more efficient, cost-effective and accurate recommender systems.
Chapter
Recommender Systems (RSs) are software tools and techniques that provide suggestions for items that are most likely of interest to a particular user. In this introductory chapter, we briefly discuss basic RS ideas and concepts. Our main goal is to delineate, in a coherent and structured way, the chapters included in this handbook. Additionally, we aim to help the reader navigate the rich and detailed content that this handbook offers.
Article
In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six "real world" medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. The paper concludes with the recommendation that AUC be used in preference to overall accuracy for "single number" evaluation of machine learning algorithms. © 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
C. C. Aggarwal. Recommender Systems: The Textbook
  • C C Aggarwal