Karsten Lübke’s research while affiliated with FOM University of Applied Sciences for Economics and Management and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (54)


Summary statistics
The joint influence of personality traits on debt ratio
Corporate debt ratios and managerial personality traits: A content analysis of chief executive officers’ speeches at annual general meetings
  • Article
  • Full-text available

March 2025

·

34 Reads

Corporate Ownership and Control

·

·

Karsten Lübke

This study contributes to the literature by analysing the joint association of managerial overconfidence, certainty, narcissism, and the Big Five personality traits with debt ratios in the institutional setting of the German two-tier system. Moreover, it provides insights into how corporate governance quality moderates the effects of personality. The analysis relied on the chief executive officers’ (CEOs’) speeches at annual general meetings (AGMs) that were voluntarily disseminated, a novel data source. Managers’ personality traits were measured using software-aided content analysis, and their impact on the debt ratio was analysed using panel regressions. Consistent with previous studies, the debt ratios of German issuers are significantly and positively related to the proxies of managerial certainty and narcissism. However, their model inclusion contributes only marginally to explanatory power. Conversely, the coefficients of the proxies for the Big Five personality traits remained statistically non-significant. Moreover, a significantly negative relationship between debt ratios and the interaction term between a proxy for corporate governance quality and managerial certainty is observed that corresponds to the risk-mitigating impact of corporate governance.

Download


Enhancing data science learning through the use of images

Image analysis represents a crucial and dynamic field within the realm of data science and machine learning, enabling automated interpretation and analysis of images through statistical methods. Beyond its practical applications, such as image recognition, images offer a valuable pedagogical tool for teaching various multivariate analysis techniques, including cluster analysis, principal component analysis, and k nearest neighbors. By employing straightforward R code, images can be transformed into tidy data formats, primed for multivariate analysis. The resultant analysis outcomes can then be translated back into images, affording students the opportunity to visually comprehend the impact of techniques like cluster analysis, principal component analysis or k nearest neighbors. This approach effectively bridges the gap between abstract multivariate analysis concepts and concrete understanding, as students can visually perceive how cluster centroids or principal components reduce the complexity of the original (image) data.


Fig. 2. Partial dependence (abscissa) vs. true predictions (ordinate) of training data for the variables duration (left) and status account (right).
Fig. 4. ICE curves for the variable duration (left) and simulation results for the computation time based on subsamples of different size, as well as the resulting distribution ofˆYofˆ ofˆY for the variable duration (right). Source: authors' own.
How much do we see? On the explainability of partial dependence plots for credit risk scoring

January 2023

·

54 Reads

·

6 Citations

Argumenta Oeconomica

Risk prediction models in credit scoring have to fulfil regulatory requirements, one of which consists in the interpretability of the model. Unfortunately, many popular modern machine learning algorithms result in models that do not satisfy this business need, whereas the research activities in the field of explainable machine learning have strongly increased in recent years. Partial dependence plots denote one of the most popular methods for model-agnostic interpretation of a feature’s effect on the model outcome, but in practice they are usually applied without answering the question of how much can actually be seen in such plots. For this purpose, in this paper a methodology is presented in order to analyse to what extent arbitrary machine learning models are explainable by partial dependence plots. The proposed framework provides both a visualisation, as well as a measure to quantify the explainability of a model on an understandable scale. A corrected version of the German credit data, one of the most popular data sets of this application domain, is used to demonstrate the proposed methodology.


Figure 2. Causal diagram for an example on convenience sample vs. random sample The example of random assignment within a randomized controlled trial is embedded in a fictitious study where the teacher tries to analyze the relationship between learning time and test score. Prior knowledge is one reasonable confounder here. (See the causal diagram on the left in Figure 3.) This confounder may even give rise to Simpson's paradox, i.e., observing a negative correlation between learning time and test score, whereas the true (direct) causal effect is positive. Again, randomness erases the arrow pointing into the treatment (see the causal diagram on the right in Figure 3).
Figure 3. Causal diagram for an example on observational study vs. randomized trial
Causal Diagrams for Descriptive Statistics

Without random sampling and/or random allocation, even descriptive statistics such as simple means or proportions can be quite misleading. Therefore, causal diagrams were added to existing course materials to address this topic and to illustrate the differences between random and convenience samples and between observational and experimental studies. We assessed student understanding in different courses with a pre-/post-survey. Additionally, we asked students to evaluate the helpfulness of the diagrams for their understanding. There is a statistically discernible positive effect with 280 students from more than seven different courses on pre- to post-knowledge. Also, most of the students agreed with the statement that the causal diagrams helped in their understanding.


Fig. 1 Causal diagram for the Covid example. The effect of Vaccination on Mortality is confounded by Age
Causality in statistics and data science education

November 2022

·

174 Reads

·

2 Citations

AStA Wirtschafts- und Sozialstatistisches Archiv

Statisticians and data scientists transform raw data into understanding and insight. Ideally, these insights empower people to act and make better decisions. However, data is often misleading especially when trying to draw conclusions about causality (for example, Simpson’s paradox). Therefore, developing causal thinking in undergraduate statistics and data science programs is important. However, there is very little guidance in the education literature about what topics and learning outcomes, specific to causality, are most important. In this paper, we propose a causality curriculum for undergraduate statistics and data science programs. Students should be able to think causally, which is defined as a broad pattern of thinking that enables individuals to appropriately assess claims of causality based upon statistical evidence. They should understand how the data generating process affects their conclusions and how to incorporate knowledge from subject matter experts in areas of application. Important topics in causality for the undergraduate curriculum include the potential outcomes framework and counterfactuals, measures of association versus causal effects, confounding, causal diagrams, and methods for estimating causal effects.


Fig. 4 Difference between PDP of aluminum and the model's predictions for class 1
Fig. 6 2D PDPs for all classes and the two most explainable variables Mg (abscissa) and Al (ordinate). Each heat map visualizes the average predicted posterior probabilities for the corresponding class as given by the title
Explaining Artificial Intelligence with Care: Analyzing the Explainability of Black Box Multiclass Machine Learning Models in Forensics

May 2022

·

78 Reads

·

5 Citations

KI - Ku_nstliche Intelligenz

In the recent past, several popular failures of black box AI systems and regulatory requirements have increased the research interest in explainable and interpretable machine learning. Among the different available approaches of model explanation, partial dependence plots (PDP) represent one of the most famous methods for model-agnostic assessment of a feature’s effect on the model response. Although PDPs are commonly used and easy to apply they only provide a simplified view on the model and thus risk to be misleading. Relying on a model interpretation given by a PDP can be of dramatic consequences in an application area such as forensics where decisions may directly affect people’s life. For this reason in this paper the degree of model explainability is investigated on a popular real-world data set from the field of forensics: the glass identification database. By means of this example the paper aims to illustrate two important aspects of machine learning model development from the practical point of view in the context of forensics: (1) the importance of a proper process for model selection, hyperparameter tuning and validation as well as (2) the careful used of explainable artificial intelligence. For this purpose, the concept of explainability is extended to multiclass classification problems as e.g. given by the glass data.




Heterogeneity in Class: Clustering Students' Attitudes Towards Statistics

January 2022

·

8 Reads

·

2 Citations

Italian Journal of Applied Statistics

Following the growing need for statistical knowledge and the ability to handle data, an increasing number of degree programs include statistics courses and lectures in their curricula. Especially in non-STEM programmes, but not exclusively, it is possible to find heterogeneous groups of students in terms of attitudes towards these topics, and therefore, deeper insights into these attitudes can help to improve curricula and teaching to better fit with the respective student cohort. In a case study, we analyse the results of a survey in which students were asked about their attitude towards statistics, based on the well-known and widely-used SATS-36 questionnaire. Our aim is to identify different attitudinal profiles and to make the heterogeneity of student groups visible. By conducting several cluster analyses, we are able to find separable student groups that can be differentiated, e.g., by their interest towards statistics, by their self-confidence towards their own abilities, or by their willingness to invest more or less effort into learning for the respective class. It turns out that using the individual items instead of the proposed SATS-36 attitudinal constructs lead to a better separation of the student clusters, as does taking gender into account in a cluster analysis of mixed-type data.


Citations (28)


... Pattern exploration aims to explore the causal relationship between modeling features and target properties. Through statistical analysis methods, such as sensitivity analysis, SHAP and partial dependence plots (PDP) [48][49][50] , crucial features could be explained to guide experiments. The essence of machine learning is that algorithms could learn patterns from data for the predictions of the properties of potential candidates. ...

Reference:

Multi-objective optimization in machine learning assisted materials design and discovery
How much do we see? On the explainability of partial dependence plots for credit risk scoring

Argumenta Oeconomica

... al., 2021), and extreme value distributions and analyses under a Bayesian framework (Fawcett, 2018). Others have developed software outside of the Shiny framework for teaching linear regression concepts (Marasinghe, Duckworth, & Shin, 2004), or further topics for probability and inferential statistics using learnR (Stoudt, Scotina, & Luebke, 2022). Few address the linear regression concepts with which students in this course have historically struggled. ...

Supporting Statistics and Data Science Education with learnr
  • Citing Article
  • April 2022

Technology Innovations in Statistics Education

... To optimize the mixtures of DSFC, this study perform the interpretation of the trained model by permutation feature importance (PFI) [60] and partial dependency plots (PDP) [61,62]. ...

Explaining Artificial Intelligence with Care: Analyzing the Explainability of Black Box Multiclass Machine Learning Models in Forensics

KI - Ku_nstliche Intelligenz

... Fairness in credit assessment refers to the equitable treatment of individuals of various backgrounds, regardless of protected characteristics such as race, gender, or age [2]. Biases in credit assessment models can inadvertently lead to discrimination and disparities in access to credit, perpetuating existing social and economic inequalities. ...

Facing the Challenges of Developing Fair Risk Scoring Models

Frontiers in Artificial Intelligence

... While consensus exists on the necessity of data literacy among the general population, educators and policymakers often lack clarity on its specific components, resulting in sporadic efforts to incorporate it into educational standards. To address this, a clearer conceptualization of data literacy is needed, encompassing the knowledge and cognitive skills required for interpreting and evaluating data across diverse contexts, including personal decision-making, civic engagement, and scientific inquiry (Gehrke et al., 2021). This clarification will facilitate the development of assessments, teaching materials, and educational standards tailored to fostering data literacy skills among students of all ages. ...

Statistics education from a data‐centric perspective

Teaching Statistics

·

·

Karsten Lübke

·

[...]

·

... In the context of causal inference, "Temporal dynamics" makes an important contribution to the causal relationships between variables and the outcome [68][69][70][71] . It is important to check whether or not, the temporal dynamics exist and if so, how stable the signal is. ...

Why We Should Teach Causal Inference: Examples in Linear Regression With Simulated Data

Journal of Statistics Education

... Most corporate data science teams are interdisciplinary, drawing on many other prime disciplines, such as computer science, math, statistics, life sciences, social sciences, and applied economics, but they operate outside of those spaces, in a corporate customer service role. The current job market and practice of data science is largely unregulated across most organizations and/or industries in the United States (Lagoze, 2014;Tene & Polonetsky, 2013;Drew, 2016;Drew, 2018;Garzcarek & Steuer, 2019;Hand, 2018a;Leonelli, 2016). Without empirical research focused on ethical data science, there is great potential for unethical analyses, poor organizational decisions, and possible legal ramifications (Benkler, 2019;boyd & Crawford, 2012: Fairfield & Shtein, 2014Markus & Topi, 2015;Steuer, 2020). ...

Applications in Statistical Computing -- From Music Data Analysis to Industrial Quality Improvement
  • Citing Book
  • October 2019

... The responses so far have been very positive; many teachers and students appreciate the change and there seems to be an improvement in the applicability and conceptual understanding [25]. More precisely, we found that within the business administration department approximately 85 (100) instructors taught our concept in the summer (winter) term of 2019 and 2020. ...

Statistical Computing and Data Science in Introductory Statistics
  • Citing Chapter
  • October 2019

... On the contrary, in case of the probability not lying below the significance level, the null hypothesis cannot be rejected. This does not automatically prove that the null hypothesis is true since it can only be rejected but not proven to be true (Lübke and Vogt, 2014). In this case, the only honest conclusion is: The hypothesis cannot be refuted yet. ...

Angewandte Wirtschaftsstatistik
  • Citing Book
  • January 2014

... Third, studies conducted to increase the prediction accuracy of dividend payment and to improve its modelling (e.g. Jawadi 2009;Won, Kim, and Bae 2012;Longinidis and Symeonidis 2013,;Luebke and Rojahn 2016). Our study belongs to the second and third groups simultaneously. ...

Firm-Specific Determinants on Dividend Changes: Insights from Data Mining
  • Citing Chapter
  • August 2016