Content uploaded by Florian Sobieczky
Author content
All content in this area was uploaded by Florian Sobieczky on Jan 08, 2021
Content may be subject to copyright.
Content uploaded by Florian Sobieczky
Author content
All content in this area was uploaded by Florian Sobieczky on Jan 08, 2021
Content may be subject to copyright.
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia
International Conference on Industry 4.0 and Smart Manufacturing
Explainability of AI-predictions based on psychological profiling
Simon Neugebauera, Lukas Rippitscha, Florian Sobieczkyb, Manuela Geißb
aspeed-invest heroes GmbH, Praterstr. 1, 1020 Vienna, Austria
bSoftware Competence Center Hagenberg GmbH (SCCH), Softwarepark 21, 4232 Hagenberg im M¨uhlkreis, Austria
Abstract
Using a local surrogate approach from explainable AI, a new prediction method for the performance of start-up companies based
on psychological profiles is proposed. The method assumes the existence of an interpreted ’base model’, the predictions of which
are enhanced by an AI-model delivering corrections that improve the overall accuracy. The surrogate (proxi) models the difference
between the original (labeled) data and the data with labels replaced by the AI-corrections. As this corresponds to comparing the
AI-correction applied before the base model is used with the original utilisation of the base model, the approach is called Before
and After prediction Parameter Comparison (BAPC). The change of the base model under application of the AI-correction yields
an interpretation of it by means of ’effective’ parameter changes. This is useful for the interpretation of ’subjective’ psychological
profiles (such as ’risk-affinity’, ’open-mindedness’, etc.) in terms of effective changes of ’objective’ monetary firm data (such as
’revenue’, ’price of product’, or ’cost of development’).
c
2021 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the International Conference on Industry 4.0 and Smart Manufac-
turing.
Keywords: Explainable AI; Local Surrogates; Small Corrections; BAPC; Psychological Profiling
1. Introduction
1.1. Problem: An analytical framework for XAI
Predictive modeling with machine learning can be successfully applied to the problem of estimating the perfor-
mance of members of a market [13,14]. Recently, non-linear effects of combinations of features have been shown to
be successfully included [7]. However, these enhancements in classifying a given company as probably successful or
unsuccessful are usually delivered without explanation or interpretation [8] as to the origin of the necessity for the
corrections. This paper presents a method for making such improvements over the given predictions of a model with
“interpretable” parameters (such as linear regression or a decision tree) explainable. The approach can be localised
E-mail address: Lukas.Rippitsch@speedinvest-heroes.com, Florian.Sobieczky@scch.at
1877-0509 c
2021 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the International Conference on Industry 4.0 and Smart Manufacturing.
2Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
in the explainable AI (XAI) literature as a model-agnostic, local surrogate of the correcting machine learning model
[15]. XAI has been an important topic within machine learning [9,10] but has seen a strong increase of interest,
recently, mainly because the legal consequences of AI-solutions using highly accurate but black-box predictive ana-
lytics fail to provide indications for the cause of malfunction [1]. The usefulness of predictive modeling in industry as
a manufacturing process management tool for mere increase of the predictive accuracy is questionable in the context
of applications with many ways of potential improvement in process performance. If there is no hint as to how the
improvement of the prediction is conveyed, there is no prescription of a systematic change of the underlying standard
procedure, so as to remove the cause for the need of correction [11]. In the light of these observations, we provide
a solution to make specifically those non-linear machine learning models interpretable which provide small correc-
tions to interpretable regression base models. The interpretations are delivered in the form of characteristic changes
of the base model’s parameters. This procedure is typical for prescriptive maintenance, in which ”to tune the machine
configuration towards less likely faults to happen” (see [12], Sect. 2) is an Industry 4.0 ”methodological criterion”.
1.2. Predicting the success of start-up enterprises
The use of psychometric profiles to predict the success of potential employees for large companies is well doc-
umented [17,18]. A meta study by Kerr et. al. (2017) [19] could determine significant correlations between certain
personality traits and the start-ups‘ financial success and chance of survival. These analyses refer to business perfor-
mance (multiple R =.31) and entrepreneurial intentions (multiple R =.36). Regarding the role of the psychological
data, it can be said that the strongest effect of personality traits, in the sense of predicting entrepreneurial activities, was
exercised by ’openness’, followed by ’conscientiousness’, ’extroversion’ and ’sociability’ [20]. Similarly, predictions
can be made about the income of founders [21,22]. According to various studies, entrepreneurs could fail despite
having financial means, a convincing idea as well as an excellent qualification, if the necessary personality traits are
not present [23,24]. This has a significant societal impact, since half of the prospective entrepreneurs fail within the
first five years [25,26]. The economic and psychological costs of entrepreneurial failure, such as the loss of savings,
over-indebtedness or unemployment after failure, could be reduced if people who are unsuitable for entrepreneurship
were given proper advice or maintained their status as employees. The skills needed to start-up and run a business also
play an important role, as previous research has shown that there are large differences in the extent of such skills [27]
and that they are systematically overestimated and misclassified [28,29,19]. It was also shown, that entrepreneurial
skill correlate positively with some personality factors [30,21,22]. However, these studies have shown that narrower
personality traits calibrated to the start-up sector are more precise predictors, e.g. META =measure of entrepreneurial
tendencies and abilities [31,32,33].
Thus, empirical data prove the usefulness of new tools and items calibrated to the start-up area to predict business
activities and success [34,35]. In addition to personality traits, as captured by the Big Five model [37] in the studies
presented, some motivational factors differ significantly between successful and unsuccessful founders, and non-
founders. In particular, a high motivation to perform (’need for achievement’) among founders has a long-term positive
effect on the success of start-ups [35]. There is also a lot of potential in studying cognitive abilities of founders, as
cognitive ability is a powerful predictor of economic outcomes and at the same time is woefully understudied in
the start-up sector. It is intuitively obvious that cognitive ability is fundamental to processing information, decision
making and learning [36].
1.3. Structure
This paper is structured to answer the question of how machine learning models adding corrections to the pre-
dictions of simple interpretable predictive ’base models’ can be explained in terms of the parameters of these base
models. In Section 2, we introduce our approach (BAPC) and its general mathematical concept. Section 3 discusses
the applicability to the problem of predicting the economical success of start-up companies’ and the influence of their
decision makers’ (CEO’s, employees’, etc.) psychological profile data. Section 4 contains the discussion of the results,
particularly comparing in the view of different ’correcting’ AI-models.
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 3
Fig. 1. Schematics of BAPC: For the prediction of the outcome Yof a given system, labeled data hXi,Yiiis handed to an interpretable base model
f(·;θ) with parameter θ(Blue box). This yields a residual error εwhich is used as the labels of the AI-corrector’s training data hXi, εii, additionally
improving the accuracy of the prediction. In order to deliver an interpretation of the additional correction in terms of a change in the parameter θ,
another instance of trained f(·, θ0) is learned with adapted (corrected) labels Yi−εi(Red box). The resulting change towards the effective parameter
θ0is taken as an explanation of the AI-model’s correction.
2. ’Small’ AI-corrections
2.1. Before and After prediction Parameter Comparison (BAPC, [3])
We consider hXi,Yiiwith i∈ {0,...,n−1}, a labeled training set in X×Y =Rn+1with Xad-dimensional instance
space and one-dimensional labels Yi∈ Y.
Step 1 - First Application of the base model: The base model is a function fθ:X → Y with θa vector of
parameters having a well-defined geometric meaning.
Yn=fθ(Xn)+εn.(1)
The standard examples are linear regression (minimisation of sum of squared residuals, Y=Rand θ∈Rd+1), or
probabilistic classification, i.e. Y=[0,1] and θ∈R2(e.g. probit or logistic regression):
Step 2 - Application of the AI-correction: In addition to the base model some non-interpretable [9] supervised
machine learning model Aη:X → Y is used, with the training data η={hXi, ii}i∈{0,...,n−1}, where εi=Yi−fθ(Xi) is
the residual error of the base model (therefore, Aηis also called the difference model).
The complete prediction is x7→ fθ(x)+Aη(x), and thus the equation for the label is given by
Yn=fθ(Xn)+Aη(Xn)+ ∆εn,(2)
in which the AI-model Aη(Xn)=b
εnis estimating the residual εn=Yn−fθ(Xn) and thereby adding additional
accuracy to the predictions of the base model fθ(·).
Step 3: Second Application of the base model: Now, in order to render the AI-part of the prediction interpretable,
another version of the base model is fitted, but to a different set of data. Namely, the new training set {hXi,Yi−b
εii}n−1
i=0
is used just as in Step 1 in the base model, again:
Yn−b
εn=fθ0(Xn)+ε0
n.(3)
In this way the correction is already applied (to Yn)before the base model fθ0(·) is fitted, yielding a different vector
of fitting parameters (θ0). The difference of the two parameter vectors can be interpreted in the context of the base
model and thus delivers an interpretation (or explanation) of the effect of the AI-model Aη(·). Thus, explanability of
Aηis provided locally at xby the comparison of the base model’s action on xwith the parameters θ0and θ- before
and after the correction is applied (hence the name ’Before and After correction Parameter Comparison’ - BAPC; see
Figure 1 for a flow-diagram of the procedure).
4Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
The idea to ’explain’ a small deviation from a state of a well-interpreted system by the parameters of that system
includes the notion of the change not be as dramatic as to render the description of the changed system with these
parameters meaningless. A ’small change’ is interpreted in the sense of a small transition from an initial to a target
state such that for all intermediate states the direction of the change remains essentially the same. If the states of the
system is given by a suitable subset of a vector space of parameters, then the difference vector of these parameters is
representative of the set of states the system traverses during the transition.
2.2. Comparing Linear Regression Models
If the base model is the linear model fθ(x)=θ·x, then the ordinary least squares regression criterion is used to
determine θfrom a given sample. Furthermore, different variants of robust regression methods (like regularisation and
quantile-regression) can be used to obtain quite tractable expressions of the change of the parameter vector θto θ0.
This approach is taken in a forthcoming paper, and will require a larger systematic examination of the performance of
BAPC in terms of the so called fidelity [10] of the local surrogate nature of the method.
2.3. Comparing classification models
A typical property of decision trees (such as those used in CART ) is the varying order in which the features
are selected to split the sample space. To compare parameter changes from θto θ0, it is necessary that they retain
their interpretation as the base model is applied for the second time. In order to guarantee this quality, the order of
the first application is recorded and used on purpose in step 3. In this way, the qualitative structure of the sample
space splitting will remain the same, while the numerical values can be used to explain the correction of the non-
interpretable machine learning model A. We will not use this approach here and postpone the development of BAPC
for classification scenarios.
3. Application: Psychological profiles of newsvendors
3.1. Expectations about the interpretation of psychological profiles
We propose a method to explain psychological profile data used within a random forest model to improve the
predictions of a base model with key performance indicators. The relative importance of a trait varies by the task
studied. Cognitive traits are indicative of performance in a greater variety of tasks. Personality traits are important
in explaining performance in specific tasks, although different personality traits are predictive in different tasks. The
nature of traits, in particular their remaining largely unchanged during their life time, allows them to be used as
consistent predictors, depending on the situational context. Linking the specific variables with economic outcomes,
namely the ventures key performance indicators (KPI’s), will yield clarity about the usefulness of the respective
variable per se and potential moderators.
The goal of the present study is to carry out modeling that uses variables of multiple kinds:
1. KPI’s (Firm internal data)
2. Coded Variables about the product type, market niche, innovativeness, market size etc. (Available from
Techcrunch/Crunchbase.com)
3. Personality Factors (Big Five)
4. Motivational Factors
5. Cognitive Ability
6. Comprehensive start-up Specific Variables (Individual Innovativeness e.g.)
7. Socio Demographic Data
Among these, several fulfil the status of not being directly measurable and qualify as moderators, mediators, or
transmit other types of features with an indirect effect. We will particularly concentrate on scenarios in which the
effect of variables of type 2 to 7 are small in comparison with variable type 1. In particular, we will assume that it is
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 5
possible to produce the same model predictions of a model including all variables types with a ’smaller model’ using
only effective KPI’s.
In this sense, our setting is that of dominant measurable (i.e. ’objective) variables vs. a set of significant, but less
dominant latent (i.e. ’subjective’) variables. For example, while a variable such as ’risk-affinity’ is a well-defined
decision-theoretic quantity [42], it may depend on personality factors, such as ’openness to experience’ or ’neuroti-
cism’ in a complicated way which is, in the modeling approach, only susceptible to non-linear machine learning (’AI’)
methods. The specific dependency of a company’s success on ’risk-affinity’ in the sense of a psychological profile may
therefore be too complicated to be explicitly revealed. However, it may be hoped for that using variables of type 1, the
same success rate as the observed one can be produced by a simpler interpretable model in which only KPI-variables
are used (see Discussion).
Accounting for the moderators and making predictions based on the team level analysis, is what makes this re-
search/model unique and novel from 1) a psychological perspective and 2) in the start-up and venture capital ecosys-
tem. Prior research greatly points out that ’overanalyzing’ the individual to predict success in start-ups has some
downsides and at the same time that the team’s future performance is key. Looking at the prior mentioned factors at
an organizational or team level is rare in the literature as it is harder to collect data and requires greater resources. For
example, how a team approaches its development and tasks varies based on the context (volatile, stagnant, or gradually
changing), organization- al constraints (how much is the company driven by regulation versus market needs?), opera-
tional characteristics (multinational, government-run, or locally confined), and the line of business in which the team
resides. Recent research suggests great potential, as studies found that successful teams were characterized by higher
levels of general cognitive ability, higher extraversion, higher agreeableness, and lower neuroticism than their unsuc-
cessful counterparts. In successful teams, the heterogeneity of conscientiousness was negatively related to increments
in product performance. [37]
Furthermore, in situations where it is desirable to predict how well a team will perform, it appears more valuable to
know the mean level of cognitive ability of members than the score of the highest or lowest scoring individual. Perhaps
the most important finding of this study, however, was the strong evidence of moderation affecting the team-level
relationship for all three operational definitions associated with level. With regard to composing work groups, the main
analysis indicates that the relationship between mean cognitive ability and team performance varies across situations.
Few empirical studies in the literature have examined potential moderators, but theory and empirical research suggest a
potential role for task complexity. Given that task complexity moderates the relationship between cognitive ability and
performance for individuals (Hunter and Hunter [39]), team-level indices of cognitive ability may be more strongly
related to team performance on complex tasks than simple tasks.
Finally, regarding task familiarity, research by Kanfer and Ackerman [38] suggests the relationship between cog-
nitive ability and task performance decreases over time for individuals as they acquire more experience with a task.
Essentially, they argue that cognitive ability is important in the early stages of learning a new task but becomes pro-
gressively less important as knowledge is acquired and skills become proceduralized. Extending this argument to the
team level, the correlation between team-level cognitive ability and team performance should be highest for novel
tasks and should decrease over time (i.e., repetitions or cycles).
In summary, it is impossible to neglect the influence of psychological factors when estimating the likelihood of
success of small companies, which are in the process of establishing their position in the market. It is our intention to
provide an explainable modeling approach in which the ’subjective’ nature of latent variables is evaded by mapping
their effect on a space in which only KPI’s are used. The effective KPI’s are then ’explaining’ the predictions for
single instances (i.e. locally) what can globally only be modeled by including all types of variables, possibly only
with a non-linear, non-interpretable machine learning model.
3.2. Case Study: Success of newsvendors
In order to demonstrate how BAPC can be used to interpret the action of an otherwise non-interpretable machine
learning method, a caricature model is considered to generate synthetic data representing a start-up company’s actions
in the market. Namely, the paradigm of the ’newsvendor problem’, in which initially some amount qof stock (news-
papers) is bought at item cost c, which may or may not be sold entirely during the day for item prices p, depending
on the demand D, a random variable: If too much stock is acquired (D<q), the investment is not all turned into profit
while if too little stock has been bought (q<D), less profit is made than the prevailing demand allows.
6Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
The newsvendor model [4,6] is one of the most often quoted paradigms in operations management. Its first appear-
ance dates back to work from 1888 by Edgeworth [5] on the optimal amount of bank reserves. It has found numerous
applications in applied economics [16].
A recent discussion [4] reveals several different approaches to solving the newsvendor problem. Among these the
method introduced by [7] considers a linear machine learning model, to which non-linear combinations of features
are added. As described in Section 2, BAPC follows a similar approach in order to explain the non-linear corrections
provided by a “non-interpretable” AI-model in terms of the interpretable parameters of the underlying base model.
3.3. The risk-affine newsvendor
Similar to the specific newsvendor setting, let pbe the price at which an item a company produces can be sold, and
cits production cost. Let qbe the number of items produced in a unit time interval (say, a month). Furthermore, let
F(x)=P[D≤x] for x≥0 be the cumulative distribution function of a random variable D, representing the demand
in the considered firm’s product.
Let D:={Di}n−1
i=0be an i.i.d. n-sized random sample of demands, and b
Fnthe cumulative distribution function of the
empirical measure belonging to the sample. As it is well known [6], with Fbeing the exact cumulative distribution
function of all Di, the critical fractile q∗is the optimal amount qof stock maximising the profit function π:R+×R+→
R, given by
π(q,D) :=p·(min(D,q))−c·q.(4)
This is the exact quantile function evaluated at 1 −c/p=(p−c)/p, the fraction of the cost of unmet demand and
unsold items. By b
q∗we denote the estimation of the critical fractile, given by
b
q∗:=b
F−1 1−c
p!.(5)
In order to use complete symbolism, these estimated quantities would have to be expressed in dependence of the
sample size n(like b
Fnetc.), as well as some element ωof the appropriate underlying incidence space Ω(like b
q∗(λ)
instead of b
q∗
n(λ, ω)). However, as it is clear that Dis a sample of fixed size and the quantities depending on it are
(derived) random variables, we omit such details, for the sake of conciseness (see [3] for more formal detail).
3.4. Preparation of the artificial data set
In order to have a synthetic data set which carries the features we wish to detect with BAPC, we
1. choose fixed constants cfor cost and pfor price,
2. draw a sample Dof observed demands Di,i∈ {0,...,n−1}, where nrepresents the number of months for which
the company has stored data,
3. for each idetermine the number of items produced: qi:=b
q∗
i+ri, where b
q∗is the estimated critical fractile of the
sample, and riis a deviation from the optimal b
q∗
iemergent to the specific (personal) strategy of the company’s
decision maker (expressing risk-affinity or risk-aversion, cf. [4,16]),
4. and calculate the resulting profits πi:=π(qi,Di).
5. We also calculate Si, the indicator for whether the company was successful in the i-th month, if π(qi,Di)>0
(indicated by Si=1), or whether there wasn’t an increase in profit (Si=0), i.e. whether π(qi,Di)≤0.
It is obvious, that riare ’perturbations’ of the estimated critical fractile b
q∗, which we inject so as to have data for
which it is possible to correct by an additional (non-interpretable) machine learning model in Step 2.
3.5. BAPC applied to probabilistic regression
We now carry out the program suggested in Section 2.1. BAPC is a regression approach so that for binary classi-
fication it is necessary to consider a numerical variable to indicate the probability of success. For our base model we
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 7
choose the (non-linear) probabilistic regression model fθ:R+→[0,1], where θ=λ, and fλ(Di)=b
Si(λ) (the ’hat’
indicates the estimation of q∗is involved) with the link function [44]
b
Si(λ)=1
|N(i)|X
D∈N(i)
1+(π(b
q∗(λ),D)),(6)
where 1+(·) is the indicator function for the positive real numbers R+and Ni={D∈D| |D−Di|< δ}. Note the
dependency on δwhich is suppressed in the notation for the sake of conciseness. We call b
Si(λ) a success indicator
and explicitly require that its values remain in [0,1]. As the non-interpretable AI-correction model Aη(D), we choose
the non-linear machine learning models ’rf’, and ’nnet’ from the R library ’caret’ [40]. The training data ηis the set
{hDi,Lgt(εi)i}n−1
i=0, with εi=Si−b
Si(λn) for i∈ {0,...,n−1}, and Lgt(x)=log(1 +x)/(1 −x) (logit-function scaled
to x∈[−1,1]). The parameter λnis determined in Step 1 and Step 3 by minimising the residual errors εiand ε0
i. Let
n=hε0, . . . , εn−1iTand 0
n=hε0
0, . . . , ε0
n−1iTbe the vectors of the residual errors of the respective training sets.
1. First base model application:
Sn=b
Sn(λn)+εn,where λn=arg min
λ
knk2.(7)
This means, b
Snis calculated and the residual error εnis recorded.
2. Application of the AI-correction:
With the data set {hDi,Lgt(εi)i}n−1
i=0, a random forest model (or neural network nnet) Aη(D) for the residuals
is trained as a function of D. The back-transformed estimate b
εnis truncated, if necessary, such that Sn−b
εn∈[0,1].
3. Second base model application:
Sn−b
εn=b
Sn(λ0
n)+ε0
n,where λ0
n=arg min
λ
k0
nk2.(8)
The base model is applied again, and a different parameter λ0
nis obtained.
The difference-vector ∆λn:=λ0
n−λnis recorded and can be interpreted as the correction of Aη(Dn) applied to the
base model prediction Sn. It is important to note that the correction depends on Dnand therefore is to be interpreted
locally for each instance. For a different value of Dnthere might not only be a different magnitude for b
εn, but also a
different interpretation ∆λn=λn−λ0
n. As discussed below, the local nature of BAPC also is reflected in aggregate
assessments, e.g. when different values of δimply different values of the success indicator.
4. Results and Discussion
4.1. Use Case: The exponentially distributed demand
For the sake of illustration we choose an exponential distribution with rate p·λfor the demand. This models the
declining demand with rising price by a linear function. It yields a critical qantile of q∗=1
p·λlog p
c. When sampling
the demand, we choose p=2,c=1, λ =1, and half of the values in Step 3 of the data-preparation process (Section
3.4) the perturbation rito be 0 (Black) and the other half to be equal to +1 (Blue) corresponding to a risk-favouring
order-quantity in which a higher maximal profit is possible under the constraint of a smaller expected profit (see
Figure 2, Top-Left for a sample of size n=100). The unperturbed and perturbed data is also plotted versus the
demand (Center) and versus the true success (Right). In the center picture, the critical quantile of about 0.34 is seen
to occur at the bend of the unperturbed (Black) curve. In the lower row of Figure 2, the correction (Left) by the
random forest or neural network model Aηgiving the estimated residual error b
εiis shown to be positive (exclusively)
for some unperturbed and negative for (exclusively) some perturbed values. Adding this correction to the success
estimator (Center: Black for unperturbed, and Blue for perturbed) yields values (Red), which help making successful
8Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
Fig. 2. Top Row - Left: The true gain (=profit) of a sample of 50 optimally chosen (Black) orders qi, together with 50 further perturbed orders (Blue)
chosen too high (ri= +1). Center: Same versus demand Di. Right: Same, versus True Success. Lower Row – Left: Performance of trained corrector
Aη. Center: Success estimator before and with added estimated correction (Red). They shift to the right for π(b
q(λ∗),Di)>0 and mostly to the left,
otherwise. Right: Fitted success estimator. The fact that more unperturbed successful companies (Black) are estimated to be more successful than
in the case of b
Si(λ) shows the successful application of the correction (’Before Correction’ has same effect as ’After Correction’).
incidents to have a larger predicted success indicator and unsuccessful ones to have a lower prediction of success.
The condition on the vector of residuals kn(λ)k2of Step 1 (and k0
n(λ)k2, in Step 3) to be minimal and to respectively
determine λ∗, and λ0∗ (Figure 3, Left) results in a difference between these two parameters for which the histogram
of an experiment with 100 repetitions (shown in Figure 3, on the right) gives evidence of the typical parameter shift
from λ∗to λ0∗. It is seen to be downwards. A lower parameter λcorresponds to a higher critical quantile 1
λplog p
c. This
exhibits the original data containing the sub-optimal orders on average to perform like as if there was less demand
than in the corrected case. In this way, the interpretation of what the correction Aηachieves on average is delivered
in the form of the expected parameter shift in the underlying statistical model: It is a larger effective demand.
Fig. 3. Left: Typical minimization result of kkand k0kyielding λ(Black) and λ0(Blue). Red curves are the Loess-smoothed residual errors
(span=0.25). The error k0kis typically larger, as it corresponds to the testing error, and kkto the training error. Center: Histograms of the
parameter shift λ0∗−λ∗for 100 iterations of applications of Monte Carlo Cross Validation with samples of size 100 for δ=0.1 for ’rf’ (Light
Blue), and ’nnet’ (Red). Right: The optimal neighborhood determining parameter δis found by minimisation of the standard deviation of the
parameter shift. The optimal δoccurring at this ’intermediate’ value shows the eminent role of the locality of BAPC: Using aggregate local success
estimations in a specific neighborhood of Diincreases precision versus using isolated point data for the training process (δ=0).
Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 9
4.2. Discussion of the Experiment
After producing a data set of demands Di, order sizes qiand success indicators Siof size 2N, with half of the entries
of b
qi(λ) ’perturbed’ by the value ri, a random subset of size Nis chosen as a training set. We use stratified Monte Carlo
cross-validation [43] in which a random equal sized split with two folds with equal number of perturbed values in each
of the two folds is produced. One fold is used as a training set for the correction model, which is applied to the other
fold for which the prediction b
S(λ) and correction Aηis performed. This process is repeated ntimes and the average
of the results for λ0∗ −λ∗is reported. The neighbourhood generating parameter δhas been found to be optimal at 0.1
by means of variance minimisation of the empirical parameter shift distribution. The use of the statistical model of
parametrized distribution as a base model resembles very much the probit model for binary classification. However,
the link function (here: b
Si(λ)) is not linear in λ[44]. The correcting model Aη(’rf’ or ’nnet’ form R’s caret library) is
trained on the values obtained from inserting the residual errors into the logit function. The prediction is transformed
back onto the original scale of the success indicators, so that b
i∈[−1,1]. This step from logit regression has proven to
be essential to obtain sufficiently accurate estimates of λ0∗and λ∗. For the specific application of the method presented
here, it is left to be explained how the psychological profile characteristics (2. to 7. in the list of Section 3.1) should
be used to model risk-welcoming behaviour. For this we refer to the discussion on ’neuroticism’ [41], or ’sociability’
[20] in connection with the newsvendor model.
5. Conclusion:
We have shown that explainability of a machine learning model acting as a corrector to a parametric base model
can be established in the framework of model-agnostic, local surrogates. The results are presented in terms of interpre-
tations which the parameter shifts of the base model provide. For the case of predicting success of start-up companies,
the method of BAPC connects data of psychological profiles (represented by the machine learning model) with ’hard’
KPI-business variables via the parameter changes of the base model. Our outlook is to apply BAPC to various pre-
dictive maintenance tasks on production data such as ’perturbed’ time-series of sensor readings. The ability to place
high-performing machine learning models (such as deep-learning) into the framework of a conventional base model
(such as probit regression) is highly attractive as in this case the base model is likely to emerge from the understanding
of the underlying physical processes.
Acknowledgements
This work has been supported by the project ’inAIco’ (FFG-Project No. 862019; Bridge Young Scientist, 2020),
as well as the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research
and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH.
References
[1] Turek, M. (2016). “Explainable Artificial Intelligence (XAI), DARPA-BAA-16-53”: Proposers Day Slides,
https://www.darpa.mil/program/explainable-artificial-intelligence.
[2] Filippini, Massimo, and Lester C. Hunt. (2011) “Energy demand and energy efficiency in the OECD countries: a stochastic demand frontier
approach.” Energy Journal 32 (2): 59–80.
[3] Sobieczky F. (2020), “A local surrogate model for explainable machine learning corrections using before and after estimation parameter
comparison (BAPC)”, (preprint)
[4] Schweitzer, M.E., and Cachon, G.P. (2000). “Decision bias in the newsvendor problem with a known demand distribution: Experimental
evidence”. Management Science 43 (3): 404–420.
[5] Edgeworth, F. Y. (1888). “The Mathematical Theory of Banking”. Journal of the Royal Statistical Society. 51 (1): 113–127
[6] Evan L. Porteus, Evan L. “The newsvendor problem”. In D. Chhajed and T. J. Lowe, editors,Building Intuition: Insights From Basic Operations
Management Models and Principles, chapter 7, 115–134. Springer, 2008.
[7] Cynthia Rudin and Gah-yi Vahn. “The big data newsvendor: Practical insights from machine learning analysis”. Cambridge, Mass.: MIT Sloan
School of Management, 2013.
[8] Doˇ
silovi´
c, F. K., Brˇ
ci´
c, M. and Hlupi´
c, N. “Explainable artificial intelligence: A survey,” in 2018 41st International Convention on Information
and Communication Technology, Electronics and Microelectronics (MIPRO), 2018.
10 Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000
[9] A. John (2019), “Artificial Intelligence and Black-Box Medical Decisions: Accuracy versus Explainability,“ Hastings Center Report, Bd. 49,
Nr. 1, 15–21.
[10] Carvalho, D. V., Pereira E. M., and Cardoso J. S. (2019) “Machine Learning Interpretability: A Survey on Methods and Metrics,“ Electronics,
Bd. 8, Nr. 8, p. 832.
[11] A. B.Arrieta, N. D´
ıaz-Rodr´
ıguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garc´
ıa, S. Gil-L´
opez, D. Molina, R. Benjamins, R. Chatila,
F. Herrera (2020). “XAI: Concepts, taxonomies, opportunities and challenges toward responsible AI”, Jour. Inf. Fusion 58, p. 82-115
[12] A. Diez-Olivan, J. Del Ser, D. Galar, B. Sierra (2019) Data fusion and machine learning for industrial prognosis: Trends and perspectives
towards Industry 4.0 Information Fusion, 50 , pp. 92-111
[13] Doloi H. (2009) “Analysis of pre-qualification criteria in contractor selection and their impacts on project success”, Construction Management
and Economics, 27:12, 1245–1263
[14] Altman, E. I. (1968): “Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy”, J. of Fin., Bd. 23 (4), p. 589–610.
[15] Molnar C. (2019), “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable”, e-book, leanpub.com, Chap. 5.7.
[16] Yan Q., Ruoxuan W., Asoo J. V., Yuwen C., Michelle M. H. S. (2011) “The newsvendor problem: Review and directions for future research”,
European Journal of Operational Research 213, 361–374
[17] Batey M., T. Chamorro-Premuzic, T., and Furnham A., (2009) “Intelligence and personality as predictors of divergent thinking: The role of
general, fluid and crystallised intelligence,“ Thinking Skills and Creativity, Bd. 4, Nr. 1.
[18] Ones, D. S., Dilchert, S., Chokalingham V., and Judge T. A. (2007) “In support of personality assessment in organizational settings,“ Personnel
Psychology, Bd. 60, Nr. 4, p. 33.
[19] Kerr W.,Nanda R., and Rhodes-Kropf M. (2014) “Entrepreneurship as Experimentation,“ J. of Econ. Perspectives, Bd. 28, Nr. 3, pp. 25–48.
[20] Yhao H., Seibert S., and Lumpkin G. (2006) “The Big Five personality dimensions and entrepreneurial status: A meta analytical review,“
Journal of Applied Psychology, Bd. 91, Nr. 2, 259–271.
[21] R. Levine und Y. Rubinstein, (2018) “Selection into Entrepreneurship and Self-Employment,“ doi:10.3386-w25350.
[22] Manso G., (2016) “Experimentation and the Returns to Entrepreneurship,“ The Review of Financial Studies, Bd. 29, Nr. 9, 2319–2340.
[23] Kalleberg A., and Leicht K., (1991) “Gender and Organizational Performance: Determinants of Small Business Survival and Success,“
Academy of Management Journal, Bd. 34, Nr. 1, 136–161.
[24] Shaver K., and Scott L., (1992) “Person, Process, Choice: The Psychology of New Venture Creation,“ Entrepreneurship Theory and Practice,
Bd. 16, Nr. 2, 23–46.
[25] Helmers C., and Rogers M., (2019) “Innovation and the Survival of New Firms in the UK,“ Review of Industrial Organization, Bd. 36, Nr. 3,
227–248.
[26] Quatraro F., and Vivarelli M., (2014) “Drivers of Entrepreneurship and Post-entry Performance of Newborn Firms in Developing Countries,“
The World Bank Research Observer, Bd. 30, Nr. 2, 277–305.
[27] Astebro T., and Chen J., (2014) “The entrepreneurial earnings puzzle: Mismeasurement or real,“ Journal of Business Venturing, Bd. 29, Nr. 1,
88–105.
[28] Bernardo A., and Welch I., (2001) “On the Evolution of Overconfidence and Entrepreneurs,“ Journal of Economics and Management Strategy,
Bd. 10, Nr. 3, 301–330.
[29] Koellinger P., Minniti M., and Schade C., (2007) “ ’I think I can, I think I can’: Overconfidence and entrepreneurial behavior,“ Journal of
Economic Psychology, Bd. 28, Nr. 4, 502–527.
[30] Caliendo M., Fossen F., and Kritikos A., (2014) “Personality characteristics and the decisions to become and stay self-employed,“ Small
Business Economics, Bd. 42, Nr. 4, 787–814.
[31] Ahmetoglu G., Leutner F., and Chamorro-Premuzic T., (2011) “EQ-nomics: Understanding the relationship between individual differences in
Trait Emotional Intelligence and entrepreneurship,“ Personality and Individual Differences, Bd. 51, Nr. 8, 1028–1033.
[32] P. Almeida, G. Ahmetoglu und T. Chamorro-Premuzic, “Who Wants to Be an Entrepreneur? The Relationship Between Vocational Interests
and Individual Differences in Entrepreneurship,“ Journal of Career Assessment, Bd. 22, Nr. 1, pp. 102-112, 2013.
[33] Leutner F., Ahmetoglu G., Akhtar R., and Chamorro-Premuzic , (2014) “The relationship between the entrepreneurial personality and the Big
Five personality traits,“ Personality and Individual Difference, Bd. 63, 58–63.
[34] Rauch A., and Frese M., (2009) “A Personality Approach to Entrepreneurship”, doi: 10.1093/oxfordhb/9780199234738.003.0006.
[35] Collins C., Hanges P. J., and Locke E., (2004) “The Relationship of Achievement Motivation to Entrepreneurial Behavior: A Meta-Analysis“,
Human Percormance, Bd. 17, Nr. 1, 95–117.
[36] Borghans L., Duckworth A. L., Heckman J. J., ter Weel B., (2008) “The Economics and Psychology of Personality Traits”, The Journal of
Human Resources, 43, Nr. 4, 972–1059.
[37] Kichuk S. L., Wiesner W. H., (1997) “The big five personality factors and team performance: implications for selecting successful product
design teams”, Journal of Engineering and Technology Management, Vol. 14; Iss. 3-4.
[38] Kanfer, R., and Ackerman, P. L. (1989). “Motivation and cognitive abilities: An integrative/aptitude-treatment interaction approach to skill
acquisition”, Journal of Applied Psychology, 74(4), 657–690.
[39] Hunter, J. E., and Hunter, R. F. (1984). “Validity and utility of alternative predictors of job performance”, Psychological Bulletin, 96(1), 72–98.
[40] Kuhn, M. (2008). “Building Predictive Models in R Using the caret Package’, Journal of Statistical Software, 28(5), 1 - 26.
doi:http://dx.doi.org/10.18637/jss.v028.i05
[41] Harrison J. S., Thurgood G. R., Boivie S., Pfarrer M. D. (2020) “Perception Is Reality: How CEOs’ Observed Personality Influences Market
Perceptions of Firm Risk and Shareholder Returns”, Acad. of Management Jour. 63, No. 4
[42] Bartholomae F., Wiens M. (2016) “Spieltheorie: Ein anwendungsorientiertes Lehrbuch”, p. 11
[43] Dubitzky W., Granzow M., Berrar D. (2007) “Fundamentals of data mining in genomics and proteomics”, Springer Sc. & Bus. Med. p. 178.
[44] Rencher A. C., Schaalje G. B. (2007) “Linear Models in Statistics”, doi.10.1002/9780470192610, Chap. 18, p. 507-516