PreprintPDF Available

International Conference on Industry 4.0 and Smart Manufacturing Explainability of AI-predictions based on psychological profiling-NC-ND license ( Peer-review under responsibility of the scientific committee of the International Conference on Industry 4.0 and Smart Manufac- turing


Abstract and Figures

This introduces the use of Before and After correction Parameter Comparison (BAPC) as a method in explainable artificial intelligence. It is based on a local surrogate approach with the novel feature of providing interpretations of small AI corrections by parameter shifts of an interpretable base model.
Content may be subject to copyright.
Available online at
Procedia Computer Science 00 (2019) 000–000
International Conference on Industry 4.0 and Smart Manufacturing
Explainability of AI-predictions based on psychological profiling
Simon Neugebauera, Lukas Rippitscha, Florian Sobieczkyb, Manuela Geißb
aspeed-invest heroes GmbH, Praterstr. 1, 1020 Vienna, Austria
bSoftware Competence Center Hagenberg GmbH (SCCH), Softwarepark 21, 4232 Hagenberg im M¨uhlkreis, Austria
Using a local surrogate approach from explainable AI, a new prediction method for the performance of start-up companies based
on psychological profiles is proposed. The method assumes the existence of an interpreted ’base model’, the predictions of which
are enhanced by an AI-model delivering corrections that improve the overall accuracy. The surrogate (proxi) models the dierence
between the original (labeled) data and the data with labels replaced by the AI-corrections. As this corresponds to comparing the
AI-correction applied before the base model is used with the original utilisation of the base model, the approach is called Before
and After prediction Parameter Comparison (BAPC). The change of the base model under application of the AI-correction yields
an interpretation of it by means of ’eective’ parameter changes. This is useful for the interpretation of ’subjective’ psychological
profiles (such as ’risk-anity’, ’open-mindedness’, etc.) in terms of eective changes of ’objective’ monetary firm data (such as
’revenue’, ’price of product’, or ’cost of development’).
2021 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (
Peer-review under responsibility of the scientific committee of the International Conference on Industry 4.0 and Smart Manufac-
Keywords: Explainable AI; Local Surrogates; Small Corrections; BAPC; Psychological Profiling
1. Introduction
1.1. Problem: An analytical framework for XAI
Predictive modeling with machine learning can be successfully applied to the problem of estimating the perfor-
mance of members of a market [13,14]. Recently, non-linear eects of combinations of features have been shown to
be successfully included [7]. However, these enhancements in classifying a given company as probably successful or
unsuccessful are usually delivered without explanation or interpretation [8] as to the origin of the necessity for the
corrections. This paper presents a method for making such improvements over the given predictions of a model with
“interpretable” parameters (such as linear regression or a decision tree) explainable. The approach can be localised
E-mail address:,
1877-0509 c
2021 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (
Peer-review under responsibility of the scientific committee of the International Conference on Industry 4.0 and Smart Manufacturing.
2Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
in the explainable AI (XAI) literature as a model-agnostic, local surrogate of the correcting machine learning model
[15]. XAI has been an important topic within machine learning [9,10] but has seen a strong increase of interest,
recently, mainly because the legal consequences of AI-solutions using highly accurate but black-box predictive ana-
lytics fail to provide indications for the cause of malfunction [1]. The usefulness of predictive modeling in industry as
a manufacturing process management tool for mere increase of the predictive accuracy is questionable in the context
of applications with many ways of potential improvement in process performance. If there is no hint as to how the
improvement of the prediction is conveyed, there is no prescription of a systematic change of the underlying standard
procedure, so as to remove the cause for the need of correction [11]. In the light of these observations, we provide
a solution to make specifically those non-linear machine learning models interpretable which provide small correc-
tions to interpretable regression base models. The interpretations are delivered in the form of characteristic changes
of the base model’s parameters. This procedure is typical for prescriptive maintenance, in which ”to tune the machine
configuration towards less likely faults to happen” (see [12], Sect. 2) is an Industry 4.0 ”methodological criterion”.
1.2. Predicting the success of start-up enterprises
The use of psychometric profiles to predict the success of potential employees for large companies is well doc-
umented [17,18]. A meta study by Kerr et. al. (2017) [19] could determine significant correlations between certain
personality traits and the start-ups‘ financial success and chance of survival. These analyses refer to business perfor-
mance (multiple R =.31) and entrepreneurial intentions (multiple R =.36). Regarding the role of the psychological
data, it can be said that the strongest eect of personality traits, in the sense of predicting entrepreneurial activities, was
exercised by ’openness’, followed by ’conscientiousness’, ’extroversion’ and ’sociability’ [20]. Similarly, predictions
can be made about the income of founders [21,22]. According to various studies, entrepreneurs could fail despite
having financial means, a convincing idea as well as an excellent qualification, if the necessary personality traits are
not present [23,24]. This has a significant societal impact, since half of the prospective entrepreneurs fail within the
first five years [25,26]. The economic and psychological costs of entrepreneurial failure, such as the loss of savings,
over-indebtedness or unemployment after failure, could be reduced if people who are unsuitable for entrepreneurship
were given proper advice or maintained their status as employees. The skills needed to start-up and run a business also
play an important role, as previous research has shown that there are large dierences in the extent of such skills [27]
and that they are systematically overestimated and misclassified [28,29,19]. It was also shown, that entrepreneurial
skill correlate positively with some personality factors [30,21,22]. However, these studies have shown that narrower
personality traits calibrated to the start-up sector are more precise predictors, e.g. META =measure of entrepreneurial
tendencies and abilities [31,32,33].
Thus, empirical data prove the usefulness of new tools and items calibrated to the start-up area to predict business
activities and success [34,35]. In addition to personality traits, as captured by the Big Five model [37] in the studies
presented, some motivational factors dier significantly between successful and unsuccessful founders, and non-
founders. In particular, a high motivation to perform (’need for achievement’) among founders has a long-term positive
eect on the success of start-ups [35]. There is also a lot of potential in studying cognitive abilities of founders, as
cognitive ability is a powerful predictor of economic outcomes and at the same time is woefully understudied in
the start-up sector. It is intuitively obvious that cognitive ability is fundamental to processing information, decision
making and learning [36].
1.3. Structure
This paper is structured to answer the question of how machine learning models adding corrections to the pre-
dictions of simple interpretable predictive ’base models’ can be explained in terms of the parameters of these base
models. In Section 2, we introduce our approach (BAPC) and its general mathematical concept. Section 3 discusses
the applicability to the problem of predicting the economical success of start-up companies’ and the influence of their
decision makers’ (CEO’s, employees’, etc.) psychological profile data. Section 4 contains the discussion of the results,
particularly comparing in the view of dierent ’correcting’ AI-models.
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 3
Fig. 1. Schematics of BAPC: For the prediction of the outcome Yof a given system, labeled data hXi,Yiiis handed to an interpretable base model
f(·;θ) with parameter θ(Blue box). This yields a residual error εwhich is used as the labels of the AI-corrector’s training data hXi, εii, additionally
improving the accuracy of the prediction. In order to deliver an interpretation of the additional correction in terms of a change in the parameter θ,
another instance of trained f(·, θ0) is learned with adapted (corrected) labels Yiεi(Red box). The resulting change towards the eective parameter
θ0is taken as an explanation of the AI-model’s correction.
2. ’Small’ AI-corrections
2.1. Before and After prediction Parameter Comparison (BAPC, [3])
We consider hXi,Yiiwith i∈ {0,...,n1}, a labeled training set in X×Y =Rn+1with Xad-dimensional instance
space and one-dimensional labels Yi∈ Y.
Step 1 - First Application of the base model: The base model is a function fθ:X → Y with θa vector of
parameters having a well-defined geometric meaning.
The standard examples are linear regression (minimisation of sum of squared residuals, Y=Rand θRd+1), or
probabilistic classification, i.e. Y=[0,1] and θR2(e.g. probit or logistic regression):
Step 2 - Application of the AI-correction: In addition to the base model some non-interpretable [9] supervised
machine learning model Aη:X → Y is used, with the training data η={hXi, ii}i∈{0,...,n1}, where εi=Yifθ(Xi) is
the residual error of the base model (therefore, Aηis also called the dierence model).
The complete prediction is x7→ fθ(x)+Aη(x), and thus the equation for the label is given by
Yn=fθ(Xn)+Aη(Xn)+ ∆εn,(2)
in which the AI-model Aη(Xn)=b
εnis estimating the residual εn=Ynfθ(Xn) and thereby adding additional
accuracy to the predictions of the base model fθ(·).
Step 3: Second Application of the base model: Now, in order to render the AI-part of the prediction interpretable,
another version of the base model is fitted, but to a dierent set of data. Namely, the new training set {hXi,Yib
is used just as in Step 1 in the base model, again:
In this way the correction is already applied (to Yn)before the base model fθ0(·) is fitted, yielding a dierent vector
of fitting parameters (θ0). The dierence of the two parameter vectors can be interpreted in the context of the base
model and thus delivers an interpretation (or explanation) of the eect of the AI-model Aη(·). Thus, explanability of
Aηis provided locally at xby the comparison of the base model’s action on xwith the parameters θ0and θ- before
and after the correction is applied (hence the name ’Before and After correction Parameter Comparison’ - BAPC; see
Figure 1 for a flow-diagram of the procedure).
4Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
The idea to ’explain’ a small deviation from a state of a well-interpreted system by the parameters of that system
includes the notion of the change not be as dramatic as to render the description of the changed system with these
parameters meaningless. A ’small change’ is interpreted in the sense of a small transition from an initial to a target
state such that for all intermediate states the direction of the change remains essentially the same. If the states of the
system is given by a suitable subset of a vector space of parameters, then the dierence vector of these parameters is
representative of the set of states the system traverses during the transition.
2.2. Comparing Linear Regression Models
If the base model is the linear model fθ(x)=θ·x, then the ordinary least squares regression criterion is used to
determine θfrom a given sample. Furthermore, dierent variants of robust regression methods (like regularisation and
quantile-regression) can be used to obtain quite tractable expressions of the change of the parameter vector θto θ0.
This approach is taken in a forthcoming paper, and will require a larger systematic examination of the performance of
BAPC in terms of the so called fidelity [10] of the local surrogate nature of the method.
2.3. Comparing classification models
A typical property of decision trees (such as those used in CART ) is the varying order in which the features
are selected to split the sample space. To compare parameter changes from θto θ0, it is necessary that they retain
their interpretation as the base model is applied for the second time. In order to guarantee this quality, the order of
the first application is recorded and used on purpose in step 3. In this way, the qualitative structure of the sample
space splitting will remain the same, while the numerical values can be used to explain the correction of the non-
interpretable machine learning model A. We will not use this approach here and postpone the development of BAPC
for classification scenarios.
3. Application: Psychological profiles of newsvendors
3.1. Expectations about the interpretation of psychological profiles
We propose a method to explain psychological profile data used within a random forest model to improve the
predictions of a base model with key performance indicators. The relative importance of a trait varies by the task
studied. Cognitive traits are indicative of performance in a greater variety of tasks. Personality traits are important
in explaining performance in specific tasks, although dierent personality traits are predictive in dierent tasks. The
nature of traits, in particular their remaining largely unchanged during their life time, allows them to be used as
consistent predictors, depending on the situational context. Linking the specific variables with economic outcomes,
namely the ventures key performance indicators (KPI’s), will yield clarity about the usefulness of the respective
variable per se and potential moderators.
The goal of the present study is to carry out modeling that uses variables of multiple kinds:
1. KPI’s (Firm internal data)
2. Coded Variables about the product type, market niche, innovativeness, market size etc. (Available from
3. Personality Factors (Big Five)
4. Motivational Factors
5. Cognitive Ability
6. Comprehensive start-up Specific Variables (Individual Innovativeness e.g.)
7. Socio Demographic Data
Among these, several fulfil the status of not being directly measurable and qualify as moderators, mediators, or
transmit other types of features with an indirect eect. We will particularly concentrate on scenarios in which the
eect of variables of type 2 to 7 are small in comparison with variable type 1. In particular, we will assume that it is
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 5
possible to produce the same model predictions of a model including all variables types with a ’smaller model’ using
only eective KPI’s.
In this sense, our setting is that of dominant measurable (i.e. ’objective) variables vs. a set of significant, but less
dominant latent (i.e. ’subjective’) variables. For example, while a variable such as ’risk-anity’ is a well-defined
decision-theoretic quantity [42], it may depend on personality factors, such as ’openness to experience’ or ’neuroti-
cism’ in a complicated way which is, in the modeling approach, only susceptible to non-linear machine learning (’AI’)
methods. The specific dependency of a company’s success on ’risk-anity’ in the sense of a psychological profile may
therefore be too complicated to be explicitly revealed. However, it may be hoped for that using variables of type 1, the
same success rate as the observed one can be produced by a simpler interpretable model in which only KPI-variables
are used (see Discussion).
Accounting for the moderators and making predictions based on the team level analysis, is what makes this re-
search/model unique and novel from 1) a psychological perspective and 2) in the start-up and venture capital ecosys-
tem. Prior research greatly points out that ’overanalyzing’ the individual to predict success in start-ups has some
downsides and at the same time that the team’s future performance is key. Looking at the prior mentioned factors at
an organizational or team level is rare in the literature as it is harder to collect data and requires greater resources. For
example, how a team approaches its development and tasks varies based on the context (volatile, stagnant, or gradually
changing), organization- al constraints (how much is the company driven by regulation versus market needs?), opera-
tional characteristics (multinational, government-run, or locally confined), and the line of business in which the team
resides. Recent research suggests great potential, as studies found that successful teams were characterized by higher
levels of general cognitive ability, higher extraversion, higher agreeableness, and lower neuroticism than their unsuc-
cessful counterparts. In successful teams, the heterogeneity of conscientiousness was negatively related to increments
in product performance. [37]
Furthermore, in situations where it is desirable to predict how well a team will perform, it appears more valuable to
know the mean level of cognitive ability of members than the score of the highest or lowest scoring individual. Perhaps
the most important finding of this study, however, was the strong evidence of moderation aecting the team-level
relationship for all three operational definitions associated with level. With regard to composing work groups, the main
analysis indicates that the relationship between mean cognitive ability and team performance varies across situations.
Few empirical studies in the literature have examined potential moderators, but theory and empirical research suggest a
potential role for task complexity. Given that task complexity moderates the relationship between cognitive ability and
performance for individuals (Hunter and Hunter [39]), team-level indices of cognitive ability may be more strongly
related to team performance on complex tasks than simple tasks.
Finally, regarding task familiarity, research by Kanfer and Ackerman [38] suggests the relationship between cog-
nitive ability and task performance decreases over time for individuals as they acquire more experience with a task.
Essentially, they argue that cognitive ability is important in the early stages of learning a new task but becomes pro-
gressively less important as knowledge is acquired and skills become proceduralized. Extending this argument to the
team level, the correlation between team-level cognitive ability and team performance should be highest for novel
tasks and should decrease over time (i.e., repetitions or cycles).
In summary, it is impossible to neglect the influence of psychological factors when estimating the likelihood of
success of small companies, which are in the process of establishing their position in the market. It is our intention to
provide an explainable modeling approach in which the ’subjective’ nature of latent variables is evaded by mapping
their eect on a space in which only KPI’s are used. The eective KPI’s are then ’explaining’ the predictions for
single instances (i.e. locally) what can globally only be modeled by including all types of variables, possibly only
with a non-linear, non-interpretable machine learning model.
3.2. Case Study: Success of newsvendors
In order to demonstrate how BAPC can be used to interpret the action of an otherwise non-interpretable machine
learning method, a caricature model is considered to generate synthetic data representing a start-up company’s actions
in the market. Namely, the paradigm of the ’newsvendor problem’, in which initially some amount qof stock (news-
papers) is bought at item cost c, which may or may not be sold entirely during the day for item prices p, depending
on the demand D, a random variable: If too much stock is acquired (D<q), the investment is not all turned into profit
while if too little stock has been bought (q<D), less profit is made than the prevailing demand allows.
6Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
The newsvendor model [4,6] is one of the most often quoted paradigms in operations management. Its first appear-
ance dates back to work from 1888 by Edgeworth [5] on the optimal amount of bank reserves. It has found numerous
applications in applied economics [16].
A recent discussion [4] reveals several dierent approaches to solving the newsvendor problem. Among these the
method introduced by [7] considers a linear machine learning model, to which non-linear combinations of features
are added. As described in Section 2, BAPC follows a similar approach in order to explain the non-linear corrections
provided by a “non-interpretable” AI-model in terms of the interpretable parameters of the underlying base model.
3.3. The risk-ane newsvendor
Similar to the specific newsvendor setting, let pbe the price at which an item a company produces can be sold, and
cits production cost. Let qbe the number of items produced in a unit time interval (say, a month). Furthermore, let
F(x)=P[Dx] for x0 be the cumulative distribution function of a random variable D, representing the demand
in the considered firm’s product.
Let D:={Di}n1
i=0be an i.i.d. n-sized random sample of demands, and b
Fnthe cumulative distribution function of the
empirical measure belonging to the sample. As it is well known [6], with Fbeing the exact cumulative distribution
function of all Di, the critical fractile qis the optimal amount qof stock maximising the profit function π:R+×R+
R, given by
π(q,D) :=p·(min(D,q))c·q.(4)
This is the exact quantile function evaluated at 1 c/p=(pc)/p, the fraction of the cost of unmet demand and
unsold items. By b
qwe denote the estimation of the critical fractile, given by
F1 1c
In order to use complete symbolism, these estimated quantities would have to be expressed in dependence of the
sample size n(like b
Fnetc.), as well as some element ωof the appropriate underlying incidence space (like b
instead of b
n(λ, ω)). However, as it is clear that Dis a sample of fixed size and the quantities depending on it are
(derived) random variables, we omit such details, for the sake of conciseness (see [3] for more formal detail).
3.4. Preparation of the artificial data set
In order to have a synthetic data set which carries the features we wish to detect with BAPC, we
1. choose fixed constants cfor cost and pfor price,
2. draw a sample Dof observed demands Di,i∈ {0,...,n1}, where nrepresents the number of months for which
the company has stored data,
3. for each idetermine the number of items produced: qi:=b
i+ri, where b
qis the estimated critical fractile of the
sample, and riis a deviation from the optimal b
iemergent to the specific (personal) strategy of the company’s
decision maker (expressing risk-anity or risk-aversion, cf. [4,16]),
4. and calculate the resulting profits πi:=π(qi,Di).
5. We also calculate Si, the indicator for whether the company was successful in the i-th month, if π(qi,Di)>0
(indicated by Si=1), or whether there wasn’t an increase in profit (Si=0), i.e. whether π(qi,Di)0.
It is obvious, that riare ’perturbations’ of the estimated critical fractile b
q, which we inject so as to have data for
which it is possible to correct by an additional (non-interpretable) machine learning model in Step 2.
3.5. BAPC applied to probabilistic regression
We now carry out the program suggested in Section 2.1. BAPC is a regression approach so that for binary classi-
fication it is necessary to consider a numerical variable to indicate the probability of success. For our base model we
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 7
choose the (non-linear) probabilistic regression model fθ:R+[0,1], where θ=λ, and fλ(Di)=b
Si(λ) (the ’hat’
indicates the estimation of qis involved) with the link function [44]
where 1+(·) is the indicator function for the positive real numbers R+and Ni={DD| |DDi|< δ}. Note the
dependency on δwhich is suppressed in the notation for the sake of conciseness. We call b
Si(λ) a success indicator
and explicitly require that its values remain in [0,1]. As the non-interpretable AI-correction model Aη(D), we choose
the non-linear machine learning models ’rf’, and ’nnet’ from the R library ’caret’ [40]. The training data ηis the set
i=0, with εi=Sib
Si(λn) for i∈ {0,...,n1}, and Lgt(x)=log(1 +x)/(1 x) (logit-function scaled
to x[1,1]). The parameter λnis determined in Step 1 and Step 3 by minimising the residual errors εiand ε0
i. Let
n=hε0, . . . , εn1iTand 0
0, . . . , ε0
n1iTbe the vectors of the residual errors of the respective training sets.
1. First base model application:
Sn(λn)+εn,where λn=arg min
This means, b
Snis calculated and the residual error εnis recorded.
2. Application of the AI-correction:
With the data set {hDi,Lgt(εi)i}n1
i=0, a random forest model (or neural network nnet) Aη(D) for the residuals
is trained as a function of D. The back-transformed estimate b
εnis truncated, if necessary, such that Snb
3. Second base model application:
n,where λ0
n=arg min
The base model is applied again, and a dierent parameter λ0
nis obtained.
The dierence-vector λn:=λ0
nλnis recorded and can be interpreted as the correction of Aη(Dn) applied to the
base model prediction Sn. It is important to note that the correction depends on Dnand therefore is to be interpreted
locally for each instance. For a dierent value of Dnthere might not only be a dierent magnitude for b
εn, but also a
dierent interpretation λn=λnλ0
n. As discussed below, the local nature of BAPC also is reflected in aggregate
assessments, e.g. when dierent values of δimply dierent values of the success indicator.
4. Results and Discussion
4.1. Use Case: The exponentially distributed demand
For the sake of illustration we choose an exponential distribution with rate p·λfor the demand. This models the
declining demand with rising price by a linear function. It yields a critical qantile of q=1
p·λlog p
c. When sampling
the demand, we choose p=2,c=1, λ =1, and half of the values in Step 3 of the data-preparation process (Section
3.4) the perturbation rito be 0 (Black) and the other half to be equal to +1 (Blue) corresponding to a risk-favouring
order-quantity in which a higher maximal profit is possible under the constraint of a smaller expected profit (see
Figure 2, Top-Left for a sample of size n=100). The unperturbed and perturbed data is also plotted versus the
demand (Center) and versus the true success (Right). In the center picture, the critical quantile of about 0.34 is seen
to occur at the bend of the unperturbed (Black) curve. In the lower row of Figure 2, the correction (Left) by the
random forest or neural network model Aηgiving the estimated residual error b
εiis shown to be positive (exclusively)
for some unperturbed and negative for (exclusively) some perturbed values. Adding this correction to the success
estimator (Center: Black for unperturbed, and Blue for perturbed) yields values (Red), which help making successful
8Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
Fig. 2. Top Row - Left: The true gain (=profit) of a sample of 50 optimally chosen (Black) orders qi, together with 50 further perturbed orders (Blue)
chosen too high (ri= +1). Center: Same versus demand Di. Right: Same, versus True Success. Lower Row – Left: Performance of trained corrector
Aη. Center: Success estimator before and with added estimated correction (Red). They shift to the right for π(b
q(λ),Di)>0 and mostly to the left,
otherwise. Right: Fitted success estimator. The fact that more unperturbed successful companies (Black) are estimated to be more successful than
in the case of b
Si(λ) shows the successful application of the correction (’Before Correction’ has same eect as ’After Correction’).
incidents to have a larger predicted success indicator and unsuccessful ones to have a lower prediction of success.
The condition on the vector of residuals kn(λ)k2of Step 1 (and k0
n(λ)k2, in Step 3) to be minimal and to respectively
determine λ, and λ0∗ (Figure 3, Left) results in a dierence between these two parameters for which the histogram
of an experiment with 100 repetitions (shown in Figure 3, on the right) gives evidence of the typical parameter shift
from λto λ0∗. It is seen to be downwards. A lower parameter λcorresponds to a higher critical quantile 1
λplog p
c. This
exhibits the original data containing the sub-optimal orders on average to perform like as if there was less demand
than in the corrected case. In this way, the interpretation of what the correction Aηachieves on average is delivered
in the form of the expected parameter shift in the underlying statistical model: It is a larger eective demand.
Fig. 3. Left: Typical minimization result of kkand k0kyielding λ(Black) and λ0(Blue). Red curves are the Loess-smoothed residual errors
(span=0.25). The error k0kis typically larger, as it corresponds to the testing error, and kkto the training error. Center: Histograms of the
parameter shift λ0λfor 100 iterations of applications of Monte Carlo Cross Validation with samples of size 100 for δ=0.1 for ’rf’ (Light
Blue), and ’nnet’ (Red). Right: The optimal neighborhood determining parameter δis found by minimisation of the standard deviation of the
parameter shift. The optimal δoccurring at this ’intermediate’ value shows the eminent role of the locality of BAPC: Using aggregate local success
estimations in a specific neighborhood of Diincreases precision versus using isolated point data for the training process (δ=0).
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 9
4.2. Discussion of the Experiment
After producing a data set of demands Di, order sizes qiand success indicators Siof size 2N, with half of the entries
of b
qi(λ) ’perturbed’ by the value ri, a random subset of size Nis chosen as a training set. We use stratified Monte Carlo
cross-validation [43] in which a random equal sized split with two folds with equal number of perturbed values in each
of the two folds is produced. One fold is used as a training set for the correction model, which is applied to the other
fold for which the prediction b
S(λ) and correction Aηis performed. This process is repeated ntimes and the average
of the results for λ0∗ λis reported. The neighbourhood generating parameter δhas been found to be optimal at 0.1
by means of variance minimisation of the empirical parameter shift distribution. The use of the statistical model of
parametrized distribution as a base model resembles very much the probit model for binary classification. However,
the link function (here: b
Si(λ)) is not linear in λ[44]. The correcting model Aη(’rf’ or ’nnet’ form R’s caret library) is
trained on the values obtained from inserting the residual errors into the logit function. The prediction is transformed
back onto the original scale of the success indicators, so that b
i[1,1]. This step from logit regression has proven to
be essential to obtain suciently accurate estimates of λ0and λ. For the specific application of the method presented
here, it is left to be explained how the psychological profile characteristics (2. to 7. in the list of Section 3.1) should
be used to model risk-welcoming behaviour. For this we refer to the discussion on ’neuroticism’ [41], or ’sociability’
[20] in connection with the newsvendor model.
5. Conclusion:
We have shown that explainability of a machine learning model acting as a corrector to a parametric base model
can be established in the framework of model-agnostic, local surrogates. The results are presented in terms of interpre-
tations which the parameter shifts of the base model provide. For the case of predicting success of start-up companies,
the method of BAPC connects data of psychological profiles (represented by the machine learning model) with ’hard’
KPI-business variables via the parameter changes of the base model. Our outlook is to apply BAPC to various pre-
dictive maintenance tasks on production data such as ’perturbed’ time-series of sensor readings. The ability to place
high-performing machine learning models (such as deep-learning) into the framework of a conventional base model
(such as probit regression) is highly attractive as in this case the base model is likely to emerge from the understanding
of the underlying physical processes.
This work has been supported by the project ’inAIco’ (FFG-Project No. 862019; Bridge Young Scientist, 2020),
as well as the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research
and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH.
[1] Turek, M. (2016). “Explainable Artificial Intelligence (XAI), DARPA-BAA-16-53”: Proposers Day Slides,
[2] Filippini, Massimo, and Lester C. Hunt. (2011) “Energy demand and energy eciency in the OECD countries: a stochastic demand frontier
approach.” Energy Journal 32 (2): 59–80.
[3] Sobieczky F. (2020), “A local surrogate model for explainable machine learning corrections using before and after estimation parameter
comparison (BAPC)”, (preprint)
[4] Schweitzer, M.E., and Cachon, G.P. (2000). “Decision bias in the newsvendor problem with a known demand distribution: Experimental
evidence”. Management Science 43 (3): 404–420.
[5] Edgeworth, F. Y. (1888). “The Mathematical Theory of Banking”. Journal of the Royal Statistical Society. 51 (1): 113–127
[6] Evan L. Porteus, Evan L. “The newsvendor problem”. In D. Chhajed and T. J. Lowe, editors,Building Intuition: Insights From Basic Operations
Management Models and Principles, chapter 7, 115–134. Springer, 2008.
[7] Cynthia Rudin and Gah-yi Vahn. “The big data newsvendor: Practical insights from machine learning analysis”. Cambridge, Mass.: MIT Sloan
School of Management, 2013.
[8] Doˇ
c, F. K., Brˇ
c, M. and Hlupi´
c, N. “Explainable artificial intelligence: A survey,” in 2018 41st International Convention on Information
and Communication Technology, Electronics and Microelectronics (MIPRO), 2018.
10 Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
[9] A. John (2019), “Artificial Intelligence and Black-Box Medical Decisions: Accuracy versus Explainability,“ Hastings Center Report, Bd. 49,
Nr. 1, 15–21.
[10] Carvalho, D. V., Pereira E. M., and Cardoso J. S. (2019) “Machine Learning Interpretability: A Survey on Methods and Metrics,“ Electronics,
Bd. 8, Nr. 8, p. 832.
[11] A. B.Arrieta, N. D´
ıguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garc´
ıa, S. Gil-L´
opez, D. Molina, R. Benjamins, R. Chatila,
F. Herrera (2020). “XAI: Concepts, taxonomies, opportunities and challenges toward responsible AI”, Jour. Inf. Fusion 58, p. 82-115
[12] A. Diez-Olivan, J. Del Ser, D. Galar, B. Sierra (2019) Data fusion and machine learning for industrial prognosis: Trends and perspectives
towards Industry 4.0 Information Fusion, 50 , pp. 92-111
[13] Doloi H. (2009) “Analysis of pre-qualification criteria in contractor selection and their impacts on project success”, Construction Management
and Economics, 27:12, 1245–1263
[14] Altman, E. I. (1968): “Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy”, J. of Fin., Bd. 23 (4), p. 589–610.
[15] Molnar C. (2019), “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable”, e-book,, Chap. 5.7.
[16] Yan Q., Ruoxuan W., Asoo J. V., Yuwen C., Michelle M. H. S. (2011) “The newsvendor problem: Review and directions for future research”,
European Journal of Operational Research 213, 361–374
[17] Batey M., T. Chamorro-Premuzic, T., and Furnham A., (2009) “Intelligence and personality as predictors of divergent thinking: The role of
general, fluid and crystallised intelligence,“ Thinking Skills and Creativity, Bd. 4, Nr. 1.
[18] Ones, D. S., Dilchert, S., Chokalingham V., and Judge T. A. (2007) “In support of personality assessment in organizational settings,“ Personnel
Psychology, Bd. 60, Nr. 4, p. 33.
[19] Kerr W.,Nanda R., and Rhodes-Kropf M. (2014) “Entrepreneurship as Experimentation,“ J. of Econ. Perspectives, Bd. 28, Nr. 3, pp. 25–48.
[20] Yhao H., Seibert S., and Lumpkin G. (2006) “The Big Five personality dimensions and entrepreneurial status: A meta analytical review,“
Journal of Applied Psychology, Bd. 91, Nr. 2, 259–271.
[21] R. Levine und Y. Rubinstein, (2018) “Selection into Entrepreneurship and Self-Employment,“ doi:10.3386-w25350.
[22] Manso G., (2016) “Experimentation and the Returns to Entrepreneurship,“ The Review of Financial Studies, Bd. 29, Nr. 9, 2319–2340.
[23] Kalleberg A., and Leicht K., (1991) “Gender and Organizational Performance: Determinants of Small Business Survival and Success,“
Academy of Management Journal, Bd. 34, Nr. 1, 136–161.
[24] Shaver K., and Scott L., (1992) “Person, Process, Choice: The Psychology of New Venture Creation,“ Entrepreneurship Theory and Practice,
Bd. 16, Nr. 2, 23–46.
[25] Helmers C., and Rogers M., (2019) “Innovation and the Survival of New Firms in the UK,“ Review of Industrial Organization, Bd. 36, Nr. 3,
[26] Quatraro F., and Vivarelli M., (2014) “Drivers of Entrepreneurship and Post-entry Performance of Newborn Firms in Developing Countries,“
The World Bank Research Observer, Bd. 30, Nr. 2, 277–305.
[27] Astebro T., and Chen J., (2014) “The entrepreneurial earnings puzzle: Mismeasurement or real,“ Journal of Business Venturing, Bd. 29, Nr. 1,
[28] Bernardo A., and Welch I., (2001) “On the Evolution of Overconfidence and Entrepreneurs,“ Journal of Economics and Management Strategy,
Bd. 10, Nr. 3, 301–330.
[29] Koellinger P., Minniti M., and Schade C., (2007) “ ’I think I can, I think I can’: Overconfidence and entrepreneurial behavior,“ Journal of
Economic Psychology, Bd. 28, Nr. 4, 502–527.
[30] Caliendo M., Fossen F., and Kritikos A., (2014) “Personality characteristics and the decisions to become and stay self-employed,“ Small
Business Economics, Bd. 42, Nr. 4, 787–814.
[31] Ahmetoglu G., Leutner F., and Chamorro-Premuzic T., (2011) “EQ-nomics: Understanding the relationship between individual dierences in
Trait Emotional Intelligence and entrepreneurship,“ Personality and Individual Dierences, Bd. 51, Nr. 8, 1028–1033.
[32] P. Almeida, G. Ahmetoglu und T. Chamorro-Premuzic, “Who Wants to Be an Entrepreneur? The Relationship Between Vocational Interests
and Individual Dierences in Entrepreneurship,“ Journal of Career Assessment, Bd. 22, Nr. 1, pp. 102-112, 2013.
[33] Leutner F., Ahmetoglu G., Akhtar R., and Chamorro-Premuzic , (2014) “The relationship between the entrepreneurial personality and the Big
Five personality traits,“ Personality and Individual Dierence, Bd. 63, 58–63.
[34] Rauch A., and Frese M., (2009) “A Personality Approach to Entrepreneurship”, doi: 10.1093/oxfordhb/9780199234738.003.0006.
[35] Collins C., Hanges P. J., and Locke E., (2004) “The Relationship of Achievement Motivation to Entrepreneurial Behavior: A Meta-Analysis“,
Human Percormance, Bd. 17, Nr. 1, 95–117.
[36] Borghans L., Duckworth A. L., Heckman J. J., ter Weel B., (2008) “The Economics and Psychology of Personality Traits”, The Journal of
Human Resources, 43, Nr. 4, 972–1059.
[37] Kichuk S. L., Wiesner W. H., (1997) “The big five personality factors and team performance: implications for selecting successful product
design teams”, Journal of Engineering and Technology Management, Vol. 14; Iss. 3-4.
[38] Kanfer, R., and Ackerman, P. L. (1989). “Motivation and cognitive abilities: An integrative/aptitude-treatment interaction approach to skill
acquisition”, Journal of Applied Psychology, 74(4), 657–690.
[39] Hunter, J. E., and Hunter, R. F. (1984). “Validity and utility of alternative predictors of job performance”, Psychological Bulletin, 96(1), 72–98.
[40] Kuhn, M. (2008). “Building Predictive Models in R Using the caret Package’, Journal of Statistical Software, 28(5), 1 - 26.
[41] Harrison J. S., Thurgood G. R., Boivie S., Pfarrer M. D. (2020) “Perception Is Reality: How CEOs’ Observed Personality Influences Market
Perceptions of Firm Risk and Shareholder Returns”, Acad. of Management Jour. 63, No. 4
[42] Bartholomae F., Wiens M. (2016) “Spieltheorie: Ein anwendungsorientiertes Lehrbuch”, p. 11
[43] Dubitzky W., Granzow M., Berrar D. (2007) “Fundamentals of data mining in genomics and proteomics”, Springer Sc. & Bus. Med. p. 178.
[44] Rencher A. C., Schaalje G. B. (2007) “Linear Models in Statistics”, doi.10.1002/9780470192610, Chap. 18, p. 507-516
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI (namely, expert systems and rule based models). Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this article examines the existing literature and contributions already done in the field of XAI, including a prospect toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior conceptual propositions with a major focus on the audience for which the explainability is sought. Departing from this definition, we propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at explaining Deep Learning methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis serves as the motivating background for a series of challenges faced by XAI, such as the interesting crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence , namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.
Full-text available
Although decision‐making algorithms are not new to medicine, the availability of vast stores of medical data, gains in computing power, and breakthroughs in machine learning are accelerating the pace of their development, expanding the range of questions they can address, and increasing their predictive power. In many cases, however, the most powerful machine learning techniques purchase diagnostic or predictive accuracy at the expense of our ability to access “the knowledge within the machine.” Without an explanation in terms of reasons or a rationale for particular decisions in individual cases, some commentators regard ceding medical decision‐making to black box systems as contravening the profound moral responsibilities of clinicians. I argue, however, that opaque decisions are more common in medicine than critics realize. Moreover, as Aristotle noted over two millennia ago, when our knowledge of causal systems is incomplete and precarious—as it often is in medicine—the ability to explain how results are produced can be less important than the ability to produce such results and empirically verify their accuracy.
Conference Paper
Full-text available
In the last decade, with availability of large datasets and more computing power, machine learning systems have achieved (super)human performance in a wide variety of tasks. Examples of this rapid development can be seen in image recognition, speech analysis, strategic game planning and many more. The problem with many state-of-the-art models is a lack of transparency and interpretability. The lack of thereof is a major drawback in many applications, e.g. healthcare and finance, where rationale for model's decision is a requirement for trust. In the light of these issues, explainable artificial intelligence (XAI) has become an area of interest in research community. This paper summarizes recent developments in XAI in supervised learning, starts a discussion on its connection with artificial general intelligence, and gives proposals for further research directions.
Dieses Lehrbuch vermittelt anhand einfacher Anwendungsaufgaben die spieltheoretischen Grundkonzepte und bietet den Studierenden so die Gelegenheit, die Konzepte durch vertieftes Training zu verinnerlichen. Sowohl die Spieltheorie als auch ihr Anwendungsbereich gewinnt immer mehr an Bedeutung und wird in den unterschiedlichsten Lehrveranstaltungen eingesetzt. Oftmals verhindern jedoch mangelnde Vorkenntnisse oder zu wenig Zeit eine intensivere Auseinandersetzung der Studierenden mit dem Stoff. Dieses Lehrbuch schafft Abhilfe, indem studentenfreundliche Erklärungen, Vertiefungen und nützliche Hinweise gegeben werden. Das Buch gliedert sich in einen theoretischen Grundlagenteil, der die wichtigsten Aspekte kompakt und anschaulich aufbereitet und einen anwendungsorientierten Aufgabenteil, in dem die Leser das Gelernte direkt überprüfen können und viele Anwendungsbeispiele finden. Der Inhalt • Entscheidungstheorie • Grundlegende Konzepte • Simultanspiele • Mehrstufige Spiele • Wiederholte Spiele • Unvollständige Information und Signalspiele Die Autoren Prof. Dr. habil. Florian Bartholomae ist Professor für Volkswirtschaftslehre an der Munich Business School, Privatdozent am Institut für Ökonomie und Recht der globalen Wirtschaft an der Universität der Bundeswehr München sowie Partner der Politikberatung Bartholomae & Schoenberg Partnerschaft. Dr. Marcus Wiens ist Leiter der Forschungsgruppe Risikomanagement am Institut für Industriebetriebslehre am Karlsruher Institut für Technologie, Research Fellow am Pamplin College of Business der Virginia Tech (USA) sowie Lehrbeauftragter für die Fächer Volkswirtschaftslehre und Risikomanagement an der International School of Management.
The so-called “smartization” of manufacturing industries has been conceived as the fourth industrial revolution or Industry 4.0, a paradigm shift propelled by the upsurge and progressive maturity of new Information and Communication Technologies (ICT) applied to industrial processes and products. From a data science perspective, this paradigm shift allows extracting relevant knowledge from monitored assets through the adoption of intelligent monitoring and data fusion strategies, as well as by the application of machine learning and optimization methods. One of the main goals of data science in this context is to effectively predict abnormal behaviors in industrial machinery, tools and processes so as to anticipate critical events and damage, eventually causing important economical losses and safety issues. In this context, data-driven prognosis is gradually gaining attention in different industrial sectors. This paper provides a comprehensive survey of the recent developments in data fusion and machine learning for industrial prognosis, placing an emphasis on the identification of research trends, niches of opportunity and unexplored challenges. To this end, a principled categorization of the utilized feature extraction techniques and machine learning methods will be provided on the basis of its intended purpose: analyze what caused the failure (descriptive), determine when the monitored asset will fail (predictive) or decide what to do so as to minimize its impact on the industry at hand (prescriptive). This threefold analysis, along with a discussion on its hardware and software implications, intends to serve as a stepping stone for future researchers and practitioners to join the community investigating on this vibrant field.
We investigate the data-driven newsvendor problem when one has n observations of p features related to the demand as well as historical demand data. We propose two approaches to finding the optimal order quantity in this new setting -- Machine Learning (ML) with and without regularization, and Kernel-weights Optimization (KO). We show that the resulting "Big Data" newsvendor problem can be solved by LP, MIP or QCQP programs under the ML approach, and by a simple sorting algorithm under the KO approach. We justify the use of feature information by showing that not including them yields inconsistent decisions, which translates to sub-optimal costs even with infinite amount of demand data. We then derive finite-sample performance bounds on the out-of-sample costs of the feature-based decisions, which shows (i) the "Big Data" regime, when over-fitting dominates finite-sample bias, is defined by p > O(n^{-1/(2 8/p)}\sqrt{\log{(n)}}), and (ii) both regularized ML and KO are effective methods to handle over-fitting. Finally, we apply the feature-based algorithms for nurse staffing in a hospital emergency room using a data set from a large UK teaching hospital and find that (i) the best KO and ML algorithms beat the best practice benchmark by 23% and 24% respectively in the out-of-sample cost with statistical significance at the 5% level, and (ii) the best KO algorithm is faster than the best ML algorithm by three orders of magnitude and the best practice benchmark by two orders of magnitude. We investigate the data-driven newsvendor problem when one has n observations of p features related to the demand as well as historical demand data. We propose two approaches to finding the optimal order quantity in this new setting -- Machine Learning (ML) with and without regularization, and Kernel-weights Optimization (KO). We show that the resulting "Big Data" newsvendor problem can be solved by LP, MIP or QCQP programs under the ML approach, and by a simple sorting algorithm under the KO approach. We justify the use of feature information by showing that not including them yields inconsistent decisions, which translates to sub-optimal costs even with infinite amount of demand data. We then derive finite-sample performance bounds on the out-of-sample costs of the feature-based decisions, which shows (i) the "Big Data" regime, when over-fitting dominates finite-sample bias, is defined by p > O(n^{-1/(2 8/p)}\sqrt{\log{(n)}}), and (ii) both regularized ML and KO are effective methods to handle over-fitting. Finally, we apply the feature-based algorithms for nurse staffing in a hospital emergency room using a data set from a large UK teaching hospital and find that (i) the best KO and ML algorithms beat the best practice benchmark by 23% and 24% respectively in the out-of-sample cost with statistical significance at the 5% level, and (ii) the best KO algorithm is faster than the best ML algorithm by three orders of magnitude and the best practice benchmark by two orders of magnitude.