
Bart Baesens- PhD
- Professor (Full) at KU Leuven
Bart Baesens
- PhD
- Professor (Full) at KU Leuven
Professor of Data Science at KU Leuven (Belgium); Lecturer at University of Southampton (United Kingdom)
About
456
Publications
235,401
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,628
Citations
Introduction
My research focusses on AI, analytics, data science, credit risk, fraud detection and model risk.
Current institution
Publications
Publications (456)
Keywords are essential to the searchability and therefore discoverability of museum and archival collections in the modern world. Without them, the collection management systems (CMS) and online collections these cultural organisations rely on to record, organise, and make their collections accessible, do not operate efficiently. However, generatin...
This study delves into computational sustainability, leverag-ing AI methods to address sustainable consumption and production challenges. Utilizing historical purchasing data from a prominent Icelandic supermarket chain spanning from 2018 to 2023, the study explores various dimensions of data aggregation, including macro-categories, micro-categorie...
Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whether the quality of LLM responses varies depending on the demographic profile of users. Considering English as the global lingua franca, along with the diversity o...
The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a nove...
Money laundering presents a pervasive challenge, burdening society by financing illegal activities. To more effectively combat and detect money laundering, the use of network information is increasingly being explored, exploiting that money laundering necessarily involves interconnected parties. This has lead to a surge in literature on network ana...
There has been an increasing interest in fraud detection methods, driven by new regulations and by the financial losses linked to fraud. One of the state-of-the-art methods to fight fraud is network analytics. Network analytics leverages the interactions between different entities to detect complex patterns that are indicative of fraud. However, ne...
To tackle the societal and person-specific adverse consequences of long-term unemployment, many public employment services (PES) have implemented data-driven profiling systems to promptly identify vulnerable job seekers. More recently, PES increasingly rely on more complex machine learning (ML) models due to their enhanced accuracy. However, increa...
Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to...
Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the continuous evolution of customer-brand relationships. In this pap...
Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships. To elaborate...
Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory po...
IoT data is a central element in the successful digital transformation of agriculture. However, IoT data comes with its own set of challenges. E.g., the risk of data contamination due to rogue sensors. A sensor is considered rogue when it provides incorrect measurements over time. To ensure correct analytical results, an essential preprocessing ste...
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to...
We research how deep learning convolutional neural networks can be used to to automatically classify the unique data set of black-and-white naval ships images from the Wright and Logan photographic collection held by the National Museum of the Royal Navy. We contrast various types of deep learning methods: pretrained models such as ConvNeXt, ResNet...
Leveraging network information for prediction tasks has become a common practice in many domains. Being an important part of targeted marketing, influencer detection can potentially benefit from incorporating dynamic network representation. In this work, we investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer det...
Fraud is as old as humankind and appears in many types and forms. Popular examples are credit card fraud, tax evasion, identity theft, insurance fraud, counterfeit, click fraud, anti-money laundering, and payment transaction fraud. In earlier research we defined fraud as an uncommon, well-considered, imperceptibly concealed, time-evolving, and care...
Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset contai...
Advanced fraud detection systems leverage the digital traces from (credit-card) transactions to detect fraudulent activity in future transactions. Recent research in fraud detection has focused primarily on data analytics combined with manual feature engineering, which is tedious, expensive and requires considerable domain expertise. Furthermore, t...
Firms and organisations cannot exist without customers. They essentially constitute the key ingredient to make a firm profitable and add shareholder and societal value. Despite recent technological advances in both data storage as well as processing and analysis, many small to large-scale firms are still struggling to quantify customer value, optim...
Machine maintenance is a challenging operational problem, where the goal is to plan sufficient preventive maintenance to avoid machine failures and overhauls. Maintenance is often imperfect in reality and does not make the asset as good as new. Although a variety of imperfect maintenance policies have been proposed in the literature, these rely on...
Nowadays, businesses in many industries face an increasing flow of data and information. Data are at the core of the decision-making process, hence it is vital to ensure that the data are of high quality and no noise is present. Outlier detection methods are aimed to find unusual patterns in data and find their applications in many practical domain...
A central problem in business concerns the optimal allocation of limited resources to a set of available tasks, where the payoff of these tasks is inherently uncertain. In credit card fraud detection, for instance, a bank can only assign a small subset of transactions to their fraud investigations team. Typically, such problems are solved using a c...
Predictive models are increasingly being used to optimize decision-making and minimize costs. A conventional approach is predict-then-optimize: first, a predictive model is built; then, this model is used to optimize decision-making. A drawback of this approach, however, is that it only incorporates costs in the second stage. Conversely, the predic...
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
In order to improve the performance of any machine learning model, it is important to focus more on the data itself instead of continuously developing new algorithms. This is exactly the aim of feature engineering. It can be defined as the clever engineering of data hereby exploiting the intrinsic bias of the machine learning technique to our benef...
A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. Detecting fraud in an imbalanced data set typically leads to predictions that favor the majority group, causing fraud to remain undetected. We discuss some popular oversampling techniques that...
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
Card transaction fraud is a growing problem affecting card holders worldwide. Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting fraudulent transactions is a bina...
Developing accurate analytical credit scoring models has become a major focus for financial institutions. For this purpose, numerous classification algorithms have been proposed for credit scoring. However, the application of deep learning algorithms for classification has been largely ignored in the credit scoring literature. The main motivation f...
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, broke...
We investigated different take-up rates of home loans in cases in which banks offered different interest rates. If a bank can increase its take-up rates, it could possibly improve its market share. In this article, we explore empirical home loan price elasticity, the effect of loan-to-value on the responsiveness of home loan customers and whether i...
The specific nature of credit loan data requires the use of mixture cure models within the class of survival analysis tools. The constructed models allow for competing risks such as early repayment and default, and for incorporating maturity, expressed as an unsusceptible part of the population. A novel further extension of such models incorporates...
Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting suspicious transactions is a binary classification problem and therefore many techniques can be applied. Interp...
In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in t...
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, broke...
The telecommunication industry is a saturated market where a proper implementation of a retention campaign is critical to be competitive, since retaining a customer is cheaper than attracting a new one. Hence, it is crucial to detect customer behavioral patterns and define accurate approaches to predict potential churners. Multiple researchers have...
Card transaction fraud is a growing problem affecting card holders worldwide. Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting fraudulent transactions is a bina...
In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in t...
A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. In most data sets, fraud occurs in typically less than 0.5% of the cases. Detecting fraud in such a highly imbalanced data set typically leads to predictions that favor the majority group, caus...
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both st...
Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting...
Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance finan...
Relational learning in networked data has been shown to be effective in a number of studies. Relational learners, composed of relational classifiers and collective inference methods, enable the inference of nodes in a network given the existence and strength of links to other nodes. These methods have been adapted to predict customer churn in telec...
Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational l...
In the mobile telecommunication industry, call networks have been used with great success to predict customer churn. These social networks are complex and rich in features, because the telecommunications operators have a lot of information about their customers. In this paper we leverage a novel framework called GraphSAGE for inductive representati...
This paper investigates different take-up rates on home loans when banks offer different interest rates. If a bank could increase the take-up rates, the bank could possible improve the bank’s market share. The article explores empirical home loan price elasticity, the effect of loan-to-value (LTV) on the responsiveness of home loan customers and wh...
Generating insights and value from data has become an important asset for organizations. At the same time, the need for experts in analytics is increasing and the number of analytics applications is growing. Recently, a new trend has emerged, i.e. analytics-as-a-service platforms, that makes it easier to apply analytics both for novice and expert u...
In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard techniques treat feature selection as a single-objective task and rely on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators may improve the quality of...
Applying social network analytics for telco churn prediction has become indispensable for almost a decade. However, in the current literature, the uptake does not reflect in a significantly increased leverage of the available information that these networks convey. First, network featurization in general is a very cumbersome process due to the comp...
Accurately predicting faulty software units helps practitioners target faulty units and prioritize their efforts to maintain software quality. Prior studies use machine-learning models to detect faulty software code. We revisit past studies and point out potential improvements. Our new study proposes a revised benchmarking configuration. The config...
In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard feature selection techniques are based on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators for model evaluation may improve the quality of scoring mod...
Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance finan...
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both st...