Bart Baesens

Bart Baesens
University of Southampton · Centre for Operational Research, Management Science and Information Systems (CORMSIS)

PhD in Applied Economic Sciences, KU Leuven, 2003

About

435
Publications
208,711
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,640
Citations
Citations since 2017
147 Research Items
10678 Citations
201720182019202020212022202305001,0001,500
201720182019202020212022202305001,0001,500
201720182019202020212022202305001,0001,500
201720182019202020212022202305001,0001,500

Publications

Publications (435)
Preprint
Full-text available
Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships. To elaborate...
Article
Performance measurement is an essential task once a statistical model is created. The area under the receiving operating characteristics curve (AUC) is the most popular measure for evaluating the quality of a binary classifier. In this case, the AUC is equal to the concordance probability, a frequently used measure to evaluate the discriminatory po...
Article
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to...
Preprint
Full-text available
Leveraging network information for prediction tasks has become a common practice in many domains. Being an important part of targeted marketing, influencer detection can potentially benefit from incorporating dynamic network representation. In this work, we investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer det...
Article
Full-text available
Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset contai...
Article
Advanced fraud detection systems leverage the digital traces from (credit-card) transactions to detect fraudulent activity in future transactions. Recent research in fraud detection has focused primarily on data analytics combined with manual feature engineering, which is tedious, expensive and requires considerable domain expertise. Furthermore, t...
Chapter
Nowadays, businesses in many industries face an increasing flow of data and information. Data are at the core of the decision-making process, hence it is vital to ensure that the data are of high quality and no noise is present. Outlier detection methods are aimed to find unusual patterns in data and find their applications in many practical domain...
Preprint
Full-text available
A central problem in business concerns the optimal allocation of limited resources to a set of available tasks, where the payoff of these tasks is inherently uncertain. In credit card fraud detection, for instance, a bank can only assign a small subset of transactions to their fraud investigations team. Typically, such problems are solved using a c...
Article
Predictive models are increasingly being used to optimize decision-making and minimize costs. A conventional approach is predict-then-optimize: first, a predictive model is built; then, this model is used to optimize decision-making. A drawback of this approach, however, is that it only incorporates costs in the second stage. Conversely, the predic...
Preprint
Full-text available
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
Article
In order to improve the performance of any machine learning model, it is important to focus more on the data itself instead of continuously developing new algorithms. This is exactly the aim of feature engineering. It can be defined as the clever engineering of data hereby exploiting the intrinsic bias of the machine learning technique to our benef...
Article
A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. Detecting fraud in an imbalanced data set typically leads to predictions that favor the majority group, causing fraud to remain undetected. We discuss some popular oversampling techniques that...
Article
Full-text available
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, h...
Article
Card transaction fraud is a growing problem affecting card holders worldwide. Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting fraudulent transactions is a bina...
Article
Developing accurate analytical credit scoring models has become a major focus for financial institutions. For this purpose, numerous classification algorithms have been proposed for credit scoring. However, the application of deep learning algorithms for classification has been largely ignored in the credit scoring literature. The main motivation f...
Article
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, broke...
Article
The specific nature of credit loan data requires the use of mixture cure models within the class of survival analysis tools. The constructed models allow for competing risks such as early repayment and default, and for incorporating maturity, expressed as an unsusceptible part of the population. A novel further extension of such models incorporates...
Article
Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting suspicious transactions is a binary classification problem and therefore many techniques can be applied. Interp...
Article
In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in t...
Preprint
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, broke...
Article
The telecommunication industry is a saturated market where a proper implementation of a retention campaign is critical to be competitive, since retaining a customer is cheaper than attracting a new one. Hence, it is crucial to detect customer behavioral patterns and define accurate approaches to predict potential churners. Multiple researchers have...
Preprint
Card transaction fraud is a growing problem affecting card holders worldwide. Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting fraudulent transactions is a bina...
Preprint
In the majority of executive domains, a notion of normality is involved in most strategic decisions. However, few data-driven tools that support strategic decision-making are available. We introduce and extend the use of autoencoders to provide strategically relevant granular feedback. A first experiment indicates that experts are inconsistent in t...
Preprint
Full-text available
A major challenge when trying to detect fraud is that the fraudulent activities form a minority class which make up a very small proportion of the data set. In most data sets, fraud occurs in typically less than 0.5% of the cases. Detecting fraud in such a highly imbalanced data set typically leads to predictions that favor the majority group, caus...
Preprint
Full-text available
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both st...
Preprint
Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting...
Preprint
Full-text available
Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance finan...
Preprint
Full-text available
Relational learning in networked data has been shown to be effective in a number of studies. Relational learners, composed of relational classifiers and collective inference methods, enable the inference of nodes in a network given the existence and strength of links to other nodes. These methods have been adapted to predict customer churn in telec...
Preprint
Full-text available
Social network analytics methods are being used in the telecommunication industry to predict customer churn with great success. In particular it has been shown that relational learners adapted to this specific problem enhance the performance of predictive models. In the current study we benchmark different strategies for constructing a relational l...
Chapter
In the mobile telecommunication industry, call networks have been used with great success to predict customer churn. These social networks are complex and rich in features, because the telecommunications operators have a lot of information about their customers. In this paper we leverage a novel framework called GraphSAGE for inductive representati...
Article
Full-text available
Generating insights and value from data has become an important asset for organizations. At the same time, the need for experts in analytics is increasing and the number of analytics applications is growing. Recently, a new trend has emerged, i.e. analytics-as-a-service platforms, that makes it easier to apply analytics both for novice and expert u...
Article
Full-text available
In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard techniques treat feature selection as a single-objective task and rely on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators may improve the quality of...
Article
Applying social network analytics for telco churn prediction has become indispensable for almost a decade. However, in the current literature, the uptake does not reflect in a significantly increased leverage of the available information that these networks convey. First, network featurization in general is a very cumbersome process due to the comp...
Preprint
Full-text available
Accurately predicting faulty software units helps practitioners target faulty units and prioritize their efforts to maintain software quality. Prior studies use machine-learning models to detect faulty software code. We revisit past studies and point out potential improvements. Our new study proposes a revised benchmarking configuration. The config...
Conference Paper
Full-text available
In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard feature selection techniques are based on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators for model evaluation may improve the quality of scoring mod...
Conference Paper
Full-text available
Globally, two billion people and more than half of the poorest adults do not use formal financial services. Consequently, there is increased emphasis on developing financial technology that can facilitate access to financial products for the unbanked. In this regard, smartphone-based microlending has emerged as a potential solution to enhance finan...
Article
Full-text available
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both st...
Article
Full-text available
In predictive analytics and statistics, entities are frequently treated as individual actors. However, in reality this assumption is not valid. In the context of retail, similar customers will behave and thus also purchase similarly to each other. By combining their behavior in an intelligent way, based on transaction history, we can leverage these...
Conference Paper
p>In many real-life applications it is crucial to be able to, given a collection of link states of a network in a certain time period, accurately predict the link state of the network at a future time. This is known as dynamic link prediction, which compared to its static counterpart is more complex, as capturing the temporal characteristics is a n...
Book
Cambridge Core - Knowledge Management, Databases and Data Mining - Principles of Database Management - by Wilfried Lemahieu
Conference Paper
Traditionally, in credit scoring, people’s banking history is analyzed to assess their creditworthiness and to determine their reliability when paying back their loans. However, as data is continuously being generated in more volume and variety than ever before, there is foundation for new credit assessment approaches, in particular by incorporatin...
Article
When content consumers explicitly judge content positively, we consider them to be engaged. Unfortunately, explicit user evaluations are difficult to collect, as they require user effort. Therefore, we propose to use device interactions as implicit feedback to detect engagement. We assess the usefulness of swipe interactions on tablets for predicti...
Article
Social media has become a widely used marketing tool for reaching potential customers. Because of its low cost, social media marketing is especially appealing to customer-to-customer (C2C) sellers. Customers can also benefit from social media marketing by learning about products and by interacting with sellers in real time. However, a seller's mark...
Thesis
Detecting suspicious activity for money laundering through transactional activity has gained utmost importance in recent years especially for European banks in order to combat terrorist financing. The Anti-Money Laundering (AML) systems in place for banks to automatically detect suspicious activity have found to be weak and ineffective. The workloa...
Chapter
Up until now, we’ve been focusing a lot on the “web scraping” part of this book. We now take a step back and link the concepts you’ve learned to the general field of data science, paying particular attention to managerial issues that will arise when you’re planning to incorporate web scraping in a data science project. This chapter also provides a...
Chapter
Together with HTML and CSS, JavaScript forms the third and final core building block of the modern web. We’ve already seen JavaScript appearing occasionally throughout this book, and it’s time that we take a closer look at it. As we’ll soon see in this chapter, our sturdy requests plus Beautiful Soup combination is no longer a viable approach to sc...
Chapter
We’ve already seen most of the core building blocks that make up the modern web: HTTP, HTML, and CSS. However, we’re not completely finished with HTTP yet. So far, we’ve only been using one of HTTP’s request “verbs” or “methods”: “GET”. This chapter will introduce you to the other methods HTTP provides, starting with the “POST” method that is commo...
Chapter
You’re now ready to get started with your own web scraping projects. This chapter wraps up by providing some closing topics. First, we provide an overview of other helpful tools and libraries you might wish to use in the context of web scraping, followed by a summary of best practices and tips to consider when web scraping.
Chapter
So far, the examples in the book have been quite simple in the sense that we only scraped (mostly) a single page. When writing web scrapers, however, there are many occasions where you’ll wish to scrape multiple pages and even multiple websites. In this context, the name “web crawler” is oftentimes used, as it will “crawl” across a site or even the...
Chapter
This chapter includes several larger examples of web scrapers. Contrary to most of the examples showcased during the previous chapters, the examples here serve a twofold purpose. First, they showcase some more examples using real-life websites instead of a curated, safe environment. The reason why we haven’t used many real-life examples so far is d...
Chapter
In this chapter, we introduce one of the core building blocks that makes up the web: the HyperText Transfer Protocol (HTTP), after having provided a brief introduction to computer networks in general. We then introduce the Python requests library, which we’ll use to perform HTTP requests and effectively start retrieving websites with Python. The ch...
Chapter
So far we have discussed the basics of HTTP and how you can perform HTTP requests in Python using the requests library. However, since most web pages are formatted using the Hypertext Markup Language (HTML), we need to understand how to extract information from such pages. As such, this chapter introduces you to HTML, as well as another core buildi...
Article
This study evaluates the most popular recommender system algorithms for use on both sides of the labor market: job recommendation and job seeker recommendation. Recent research shows the drawbacks of focusing solely on predictive power when evaluating recommender systems, which become especially prominent in job- and job seeker recommendation, wher...
Article
The success of retention campaigns in fast-moving and saturated markets, such as the telecommunication industry, often depends on accurately predicting potential churners. Being able to identify certain behavioral patterns that lead to churn is important, because it allows the organization to make arrangements for retention in a timely manner. More...
Article
The development of new data analytical methods remains a crucial factor in the combat against insurance fraud. Methods rooted in the research field of anomaly detection are considered as promising candidates for this purpose. Commonly, a fraud data set contains both numeric and nominal attributes, where, due to the ease of expressiveness, the latte...
Article
The goal of customer retention campaigns, by design, is to add value and enhance the operational efficiency of businesses. For organizations that strive to retain their customers in saturated, and sometimes fast moving, markets such as the telecommunication and banking industries, implementing customer churn prediction models that perform well and...