Seppe KLM vanden BrouckeGhent University | UGhent · Department of Management Information and Operations Management
Seppe KLM vanden Broucke
PhD in Applied Economics
About
129
Publications
33,560
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,119
Citations
Introduction
Seppe vanden Broucke received a PhD in Applied Economics at KU Leuven, Belgium in 2014. Currently, Seppe is working as an assistant professor at the department of Business Informatics at UGent (Belgium) and is a lecturer at KU Leuven (Belgium). Seppe's research interests include business data mining and analytics, machine learning, process management, process mining. His work has been published in well-known international journals and presented at top conferences.
Additional affiliations
January 2011 - January 2016
Publications
Publications (129)
This study delves into computational sustainability, leverag-ing AI methods to address sustainable consumption and production challenges. Utilizing historical purchasing data from a prominent Icelandic supermarket chain spanning from 2018 to 2023, the study explores various dimensions of data aggregation, including macro-categories, micro-categorie...
The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and targets, they fall short when it comes to considering spatial factors. This limitation arises from the...
Tree-based methods have become popular for spatial prediction tasks due to their high accuracy in dividing input spaces into regions with different predictions. However, traditional decision trees perform univariate splits, resulting in rectangular regions. To address this limitation and provide more intuitive and accurate decision boundaries for s...
Developing LSTM neural networks that can accurately predict the future trajectory of ongoing cases and their remaining runtime is an active area of research in predictive process monitoring. In this work a novel complete remaining trace prediction (CRTP) LSTM is proposed. This model is trained to directly predict the complete remaining trace and ru...
Predicting house prices is a challenging task that many researchers have attempted to address. As accurate house prices allow better informing parties in the real estate market, improving housing policies and real estate appraisal, a comprehensive overview of house price prediction strategies is valuable for both research and society. In this work,...
Churn prediction on imbalanced data is a challenging task. Ensemble solutions exhibit good performance in dealing with class imbalance but fail to improve the profit-oriented goal in churn prediction. This paper attempts to develop a new bagging-based selective ensemble paradigm for profit-oriented churn prediction in class imbalance scenarios. The...
Learning from positive and unlabeled data, or PU learning, is the setting in which a binary classifier can only train from positive and unlabeled instances, the latter containing both positive as well as negative instances. Many PU applications, e.g., fraud detection, are also characterized by class imbalance, which creates a challenging setting. N...
A lot of recent literature on outcome-oriented predictive process monitoring focuses on using models from machine and deep learning. In this literature, it is assumed the outcome labels of the historical cases are all known. However, in some cases, the labelling of cases is incomplete or inaccurate. For instance, you might only observe negative cus...
Several machine learning applications, including genetics and fraud detection, suffer from incomplete label information. In such applications, a classifier can only train from positive and unlabeled (PU) examples in which the unlabeled data consist of both positive and negative examples. Despite a substantial presence of PU learning in the literatu...
We research how deep learning convolutional neural networks can be used to to automatically classify the unique data set of black-and-white naval ships images from the Wright and Logan photographic collection held by the National Museum of the Royal Navy. We contrast various types of deep learning methods: pretrained models such as ConvNeXt, ResNet...
Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), sta...
Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), sta...
Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset contai...
Class imbalance is a critical issue in customer classification, for which a plethora of techniques have been proposed in the current body of literature. In particular, generative adversarial network (GAN)-based oversampling can capture the true data distribution of minority class samples and generate new samples, and this approach has demonstrated...
Currently, the state of a house is typically assessed by an expert, which is time and resource intensive. Therefore, an automatic assessment could have economic, social and ecological benefits. Hence, this study presents a binary classification model using transfer learning to classify Google Street View images of houses. For this purpose, a three-...
Predictive process monitoring concerns itself with the prediction of ongoing cases in (business) processes. Prediction tasks typically focus on remaining time, outcome, next event or full case suffix prediction. Various methods using machine and deep learning have been proposed for these tasks in recent years. Especially recurrent neural networks (...
Churn prediction on imbalanced data is a challenging task. Ensemble solutions
exhibit good performance in dealing with class imbalance but fail to improve the
profit-oriented goal in churn prediction. This paper attempts to develop a new
bagging-based selective ensemble paradigm for profit-oriented churn prediction
in class imbalance scenarios. The...
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
Predictive process monitoring concerns itself with the prediction of ongoing cases in (business) processes. Prediction tasks typically focus on remaining time, outcome, next event or full case suffix prediction. Various methods using machine and deep learning have been proposed for these tasks in recent years. Especially recurrent neural networks (...
In order to improve the performance of any machine learning model, it is important to focus more on the data itself instead of continuously developing new algorithms. This is exactly the aim of feature engineering. It can be defined as the clever engineering of data hereby exploiting the intrinsic bias of the machine learning technique to our benef...
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
Conformance checking is concerned with the task of assessing the quality of process models describing actual behavior captured in an event log across different dimensions. In this paper, a novel approach for obtaining the degree of recall and precision between a process model and event log is introduced. The approach relies on the generation of a s...
Developing accurate analytical credit scoring models has become a major focus for financial institutions. For this purpose, numerous classification algorithms have been proposed for credit scoring. However, the application of deep learning algorithms for classification has been largely ignored in the credit scoring literature. The main motivation f...
Conformance checking describes process mining techniques used to compare an event log and a corresponding process model. In this paper, we propose an entirely new approach to conformance checking based on neural network-based embeddings. These embeddings are vector representations of every activity/task present in the model and log, obtained via ac...
Calibration is a technique used to obtain accurate probability estimation for classification problems in real applications. Class imbalance can create considerable challenges in obtaining accurate probabilities for calibration methods. However, previous research has paid little attention to this issue. In this paper, we present an experimental inve...
Representation Learning in dynamic networks has gained increasingly more attention due to its promising applicability. In the literature, we can find two popular approaches that have been adapted to dynamic networks: random-walk based techniques and graph-autoencoders. Despite the popularity, no work has compared them in well-know datasets. We fill...
Separating decision modelling from the processes modelling concern recently gained significant support in literature, as incorporating both concerns into a single model impairs the scalability, maintainability, flexibility and understandability of both processes and decisions. Most notably the introduction of the Decision Model and Notation (DMN) s...
Generating insights and value from data has become an important asset for organizations. At the same time, the need for experts in analytics is increasing and the number of analytics applications is growing. Recently, a new trend has emerged, i.e. analytics-as-a-service platforms, that makes it easier to apply analytics both for novice and expert u...
This report illustrates the interactions between decisions and processes in a real-life enriched event log revolving around a bank loan application and approval process. The decision models are represented using the recently introduced Decision Model and Notation (DMN) standard of the Object Management Group (OMG). For this purpose, we capitalise o...
Imbalanced classification is a challenging issue in data mining and machine learning, for which a large number of solutions have been proposed. In this paper, we introduce an R library called IRIC, which integrates a wide set of solutions for imbalanced binary classification. IRIC not only provides a new implementation of some state-of-art techniqu...
The analysis of business processes is a multifaceted problem that is comprised of analysing both activities’ workflow, as well as the decisions that are made throughout that workflow. In process mining, the automated discovery of process models from event data, a strong emphasis can be found towards discovering this workflow, as well as how data in...
The research area of process mining concerns itself with knowledge discovery from event logs, containing recorded traces of executions as stored by process aware information systems. Over the past decade, research in process mining has increasingly focused on predictive process monitoring to provide businesses with valuable information in order to...
The aspect of collaboration is gaining a considerable amount of importance in current logistics operations. The large number of dynamics that arise in collaborative logistics processes with numerous complexities and variations can make the modelling of such collaborative logistics processes a challenging task. Hence, a systematic modelling approach...
Cambridge Core - Knowledge Management, Databases and Data Mining - Principles of Database Management - by Wilfried Lemahieu
When content consumers explicitly judge content positively, we consider them to be engaged. Unfortunately, explicit user evaluations are difficult to collect, as they require user effort. Therefore, we propose to use device interactions as implicit feedback to detect engagement.
We assess the usefulness of swipe interactions on tablets for predicti...
Up until now, we’ve been focusing a lot on the “web scraping” part of this book. We now take a step back and link the concepts you’ve learned to the general field of data science, paying particular attention to managerial issues that will arise when you’re planning to incorporate web scraping in a data science project. This chapter also provides a...
Together with HTML and CSS, JavaScript forms the third and final core building block of the modern web. We’ve already seen JavaScript appearing occasionally throughout this book, and it’s time that we take a closer look at it. As we’ll soon see in this chapter, our sturdy requests plus Beautiful Soup combination is no longer a viable approach to sc...
We’ve already seen most of the core building blocks that make up the modern web: HTTP, HTML, and CSS. However, we’re not completely finished with HTTP yet. So far, we’ve only been using one of HTTP’s request “verbs” or “methods”: “GET”. This chapter will introduce you to the other methods HTTP provides, starting with the “POST” method that is commo...
You’re now ready to get started with your own web scraping projects. This chapter wraps up by providing some closing topics. First, we provide an overview of other helpful tools and libraries you might wish to use in the context of web scraping, followed by a summary of best practices and tips to consider when web scraping.
So far, the examples in the book have been quite simple in the sense that we only scraped (mostly) a single page. When writing web scrapers, however, there are many occasions where you’ll wish to scrape multiple pages and even multiple websites. In this context, the name “web crawler” is oftentimes used, as it will “crawl” across a site or even the...
This chapter includes several larger examples of web scrapers. Contrary to most of the examples showcased during the previous chapters, the examples here serve a twofold purpose. First, they showcase some more examples using real-life websites instead of a curated, safe environment. The reason why we haven’t used many real-life examples so far is d...
In this chapter, we introduce one of the core building blocks that makes up the web: the HyperText Transfer Protocol (HTTP), after having provided a brief introduction to computer networks in general. We then introduce the Python requests library, which we’ll use to perform HTTP requests and effectively start retrieving websites with Python. The ch...
So far we have discussed the basics of HTTP and how you can perform HTTP requests in Python using the requests library. However, since most web pages are formatted using the Hypertext Markup Language (HTML), we need to understand how to extract information from such pages. As such, this chapter introduces you to HTML, as well as another core buildi...
The development of new data analytical methods remains a crucial factor in the combat against insurance fraud. Methods rooted in the research field of anomaly detection are considered as promising candidates for this purpose. Commonly, a fraud data set contains both numeric and nominal attributes, where, due to the ease of expressiveness, the latte...
This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors...
To detect churners in a vast customer base, as is the case with telephone service providers, companies heavily rely on predictive churn models to remain competitive in a saturated market. In previous work, the expected maximum profit measure for customer churn (EMPC) has been proposed in order to determine the most profitable churn model. However,...
Customer retention campaigns increasingly rely on predictive models to detect potential churners in a vast customer base. From the perspective of machine learning, the task of predicting customer churn can be presented as a binary classification problem. Using data on historic behavior, classification algorithms are built with the purpose of accura...
The interest of integrating decision analysis approaches with the automated discovery of processes from data has seen a vast surge over the past few years. Most notably the introduction of the Decision Model and Notation (DMN) standard by the Object Management Group has provided a suitable solution for filling the void of decision representation in...