Content uploaded by Florian Sobieczky

Author content

All content in this area was uploaded by Florian Sobieczky on Jan 08, 2021

Content may be subject to copyright.

Content uploaded by Florian Sobieczky

Author content

All content in this area was uploaded by Florian Sobieczky on Jan 08, 2021

Content may be subject to copyright.

Available online at www.sciencedirect.com

Procedia Computer Science 00 (2019) 000–000

www.elsevier.com/locate/procedia

International Conference on Industry 4.0 and Smart Manufacturing

Explainability of AI-predictions based on psychological proﬁling

Simon Neugebauera, Lukas Rippitscha, Florian Sobieczkyb, Manuela Geißb

aspeed-invest heroes GmbH, Praterstr. 1, 1020 Vienna, Austria

bSoftware Competence Center Hagenberg GmbH (SCCH), Softwarepark 21, 4232 Hagenberg im M¨uhlkreis, Austria

Abstract

Using a local surrogate approach from explainable AI, a new prediction method for the performance of start-up companies based

on psychological proﬁles is proposed. The method assumes the existence of an interpreted ’base model’, the predictions of which

are enhanced by an AI-model delivering corrections that improve the overall accuracy. The surrogate (proxi) models the diﬀerence

between the original (labeled) data and the data with labels replaced by the AI-corrections. As this corresponds to comparing the

AI-correction applied before the base model is used with the original utilisation of the base model, the approach is called Before

and After prediction Parameter Comparison (BAPC). The change of the base model under application of the AI-correction yields

an interpretation of it by means of ’eﬀective’ parameter changes. This is useful for the interpretation of ’subjective’ psychological

proﬁles (such as ’risk-aﬃnity’, ’open-mindedness’, etc.) in terms of eﬀective changes of ’objective’ monetary ﬁrm data (such as

’revenue’, ’price of product’, or ’cost of development’).

c

2021 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the International Conference on Industry 4.0 and Smart Manufac-

turing.

Keywords: Explainable AI; Local Surrogates; Small Corrections; BAPC; Psychological Proﬁling

1. Introduction

1.1. Problem: An analytical framework for XAI

Predictive modeling with machine learning can be successfully applied to the problem of estimating the perfor-

mance of members of a market [13,14]. Recently, non-linear eﬀects of combinations of features have been shown to

be successfully included [7]. However, these enhancements in classifying a given company as probably successful or

unsuccessful are usually delivered without explanation or interpretation [8] as to the origin of the necessity for the

corrections. This paper presents a method for making such improvements over the given predictions of a model with

“interpretable” parameters (such as linear regression or a decision tree) explainable. The approach can be localised

E-mail address: Lukas.Rippitsch@speedinvest-heroes.com, Florian.Sobieczky@scch.at

1877-0509 c

2021 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Peer-review under responsibility of the scientiﬁc committee of the International Conference on Industry 4.0 and Smart Manufacturing.

2Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000

in the explainable AI (XAI) literature as a model-agnostic, local surrogate of the correcting machine learning model

[15]. XAI has been an important topic within machine learning [9,10] but has seen a strong increase of interest,

recently, mainly because the legal consequences of AI-solutions using highly accurate but black-box predictive ana-

lytics fail to provide indications for the cause of malfunction [1]. The usefulness of predictive modeling in industry as

a manufacturing process management tool for mere increase of the predictive accuracy is questionable in the context

of applications with many ways of potential improvement in process performance. If there is no hint as to how the

improvement of the prediction is conveyed, there is no prescription of a systematic change of the underlying standard

procedure, so as to remove the cause for the need of correction [11]. In the light of these observations, we provide

a solution to make speciﬁcally those non-linear machine learning models interpretable which provide small correc-

tions to interpretable regression base models. The interpretations are delivered in the form of characteristic changes

of the base model’s parameters. This procedure is typical for prescriptive maintenance, in which ”to tune the machine

conﬁguration towards less likely faults to happen” (see [12], Sect. 2) is an Industry 4.0 ”methodological criterion”.

1.2. Predicting the success of start-up enterprises

The use of psychometric proﬁles to predict the success of potential employees for large companies is well doc-

umented [17,18]. A meta study by Kerr et. al. (2017) [19] could determine signiﬁcant correlations between certain

personality traits and the start-ups‘ ﬁnancial success and chance of survival. These analyses refer to business perfor-

mance (multiple R =.31) and entrepreneurial intentions (multiple R =.36). Regarding the role of the psychological

data, it can be said that the strongest eﬀect of personality traits, in the sense of predicting entrepreneurial activities, was

exercised by ’openness’, followed by ’conscientiousness’, ’extroversion’ and ’sociability’ [20]. Similarly, predictions

can be made about the income of founders [21,22]. According to various studies, entrepreneurs could fail despite

having ﬁnancial means, a convincing idea as well as an excellent qualiﬁcation, if the necessary personality traits are

not present [23,24]. This has a signiﬁcant societal impact, since half of the prospective entrepreneurs fail within the

ﬁrst ﬁve years [25,26]. The economic and psychological costs of entrepreneurial failure, such as the loss of savings,

over-indebtedness or unemployment after failure, could be reduced if people who are unsuitable for entrepreneurship

were given proper advice or maintained their status as employees. The skills needed to start-up and run a business also

play an important role, as previous research has shown that there are large diﬀerences in the extent of such skills [27]

and that they are systematically overestimated and misclassiﬁed [28,29,19]. It was also shown, that entrepreneurial

skill correlate positively with some personality factors [30,21,22]. However, these studies have shown that narrower

personality traits calibrated to the start-up sector are more precise predictors, e.g. META =measure of entrepreneurial

tendencies and abilities [31,32,33].

Thus, empirical data prove the usefulness of new tools and items calibrated to the start-up area to predict business

activities and success [34,35]. In addition to personality traits, as captured by the Big Five model [37] in the studies

presented, some motivational factors diﬀer signiﬁcantly between successful and unsuccessful founders, and non-

founders. In particular, a high motivation to perform (’need for achievement’) among founders has a long-term positive

eﬀect on the success of start-ups [35]. There is also a lot of potential in studying cognitive abilities of founders, as

cognitive ability is a powerful predictor of economic outcomes and at the same time is woefully understudied in

the start-up sector. It is intuitively obvious that cognitive ability is fundamental to processing information, decision

making and learning [36].

1.3. Structure

This paper is structured to answer the question of how machine learning models adding corrections to the pre-

dictions of simple interpretable predictive ’base models’ can be explained in terms of the parameters of these base

models. In Section 2, we introduce our approach (BAPC) and its general mathematical concept. Section 3 discusses

the applicability to the problem of predicting the economical success of start-up companies’ and the inﬂuence of their

decision makers’ (CEO’s, employees’, etc.) psychological proﬁle data. Section 4 contains the discussion of the results,

particularly comparing in the view of diﬀerent ’correcting’ AI-models.

Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 3

Fig. 1. Schematics of BAPC: For the prediction of the outcome Yof a given system, labeled data hXi,Yiiis handed to an interpretable base model

f(·;θ) with parameter θ(Blue box). This yields a residual error εwhich is used as the labels of the AI-corrector’s training data hXi, εii, additionally

improving the accuracy of the prediction. In order to deliver an interpretation of the additional correction in terms of a change in the parameter θ,

another instance of trained f(·, θ0) is learned with adapted (corrected) labels Yi−εi(Red box). The resulting change towards the eﬀective parameter

θ0is taken as an explanation of the AI-model’s correction.

2. ’Small’ AI-corrections

2.1. Before and After prediction Parameter Comparison (BAPC, [3])

We consider hXi,Yiiwith i∈ {0,...,n−1}, a labeled training set in X×Y =Rn+1with Xad-dimensional instance

space and one-dimensional labels Yi∈ Y.

Step 1 - First Application of the base model: The base model is a function fθ:X → Y with θa vector of

parameters having a well-deﬁned geometric meaning.

Yn=fθ(Xn)+εn.(1)

The standard examples are linear regression (minimisation of sum of squared residuals, Y=Rand θ∈Rd+1), or

probabilistic classiﬁcation, i.e. Y=[0,1] and θ∈R2(e.g. probit or logistic regression):

Step 2 - Application of the AI-correction: In addition to the base model some non-interpretable [9] supervised

machine learning model Aη:X → Y is used, with the training data η={hXi, ii}i∈{0,...,n−1}, where εi=Yi−fθ(Xi) is

the residual error of the base model (therefore, Aηis also called the diﬀerence model).

The complete prediction is x7→ fθ(x)+Aη(x), and thus the equation for the label is given by

Yn=fθ(Xn)+Aη(Xn)+ ∆εn,(2)

in which the AI-model Aη(Xn)=b

εnis estimating the residual εn=Yn−fθ(Xn) and thereby adding additional

accuracy to the predictions of the base model fθ(·).

Step 3: Second Application of the base model: Now, in order to render the AI-part of the prediction interpretable,

another version of the base model is ﬁtted, but to a diﬀerent set of data. Namely, the new training set {hXi,Yi−b

εii}n−1

i=0

is used just as in Step 1 in the base model, again:

Yn−b

εn=fθ0(Xn)+ε0

n.(3)

In this way the correction is already applied (to Yn)before the base model fθ0(·) is ﬁtted, yielding a diﬀerent vector

of ﬁtting parameters (θ0). The diﬀerence of the two parameter vectors can be interpreted in the context of the base

model and thus delivers an interpretation (or explanation) of the eﬀect of the AI-model Aη(·). Thus, explanability of

Aηis provided locally at xby the comparison of the base model’s action on xwith the parameters θ0and θ- before

and after the correction is applied (hence the name ’Before and After correction Parameter Comparison’ - BAPC; see

Figure 1 for a ﬂow-diagram of the procedure).

4Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000

The idea to ’explain’ a small deviation from a state of a well-interpreted system by the parameters of that system

includes the notion of the change not be as dramatic as to render the description of the changed system with these

parameters meaningless. A ’small change’ is interpreted in the sense of a small transition from an initial to a target

state such that for all intermediate states the direction of the change remains essentially the same. If the states of the

system is given by a suitable subset of a vector space of parameters, then the diﬀerence vector of these parameters is

representative of the set of states the system traverses during the transition.

2.2. Comparing Linear Regression Models

If the base model is the linear model fθ(x)=θ·x, then the ordinary least squares regression criterion is used to

determine θfrom a given sample. Furthermore, diﬀerent variants of robust regression methods (like regularisation and

quantile-regression) can be used to obtain quite tractable expressions of the change of the parameter vector θto θ0.

This approach is taken in a forthcoming paper, and will require a larger systematic examination of the performance of

BAPC in terms of the so called ﬁdelity [10] of the local surrogate nature of the method.

2.3. Comparing classiﬁcation models

A typical property of decision trees (such as those used in CART ) is the varying order in which the features

are selected to split the sample space. To compare parameter changes from θto θ0, it is necessary that they retain

their interpretation as the base model is applied for the second time. In order to guarantee this quality, the order of

the ﬁrst application is recorded and used on purpose in step 3. In this way, the qualitative structure of the sample

space splitting will remain the same, while the numerical values can be used to explain the correction of the non-

interpretable machine learning model A. We will not use this approach here and postpone the development of BAPC

for classiﬁcation scenarios.

3. Application: Psychological proﬁles of newsvendors

3.1. Expectations about the interpretation of psychological proﬁles

We propose a method to explain psychological proﬁle data used within a random forest model to improve the

predictions of a base model with key performance indicators. The relative importance of a trait varies by the task

studied. Cognitive traits are indicative of performance in a greater variety of tasks. Personality traits are important

in explaining performance in speciﬁc tasks, although diﬀerent personality traits are predictive in diﬀerent tasks. The

nature of traits, in particular their remaining largely unchanged during their life time, allows them to be used as

consistent predictors, depending on the situational context. Linking the speciﬁc variables with economic outcomes,

namely the ventures key performance indicators (KPI’s), will yield clarity about the usefulness of the respective

variable per se and potential moderators.

The goal of the present study is to carry out modeling that uses variables of multiple kinds:

1. KPI’s (Firm internal data)

2. Coded Variables about the product type, market niche, innovativeness, market size etc. (Available from

Techcrunch/Crunchbase.com)

3. Personality Factors (Big Five)

4. Motivational Factors

5. Cognitive Ability

6. Comprehensive start-up Speciﬁc Variables (Individual Innovativeness e.g.)

7. Socio Demographic Data

Among these, several fulﬁl the status of not being directly measurable and qualify as moderators, mediators, or

transmit other types of features with an indirect eﬀect. We will particularly concentrate on scenarios in which the

eﬀect of variables of type 2 to 7 are small in comparison with variable type 1. In particular, we will assume that it is

Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 5

possible to produce the same model predictions of a model including all variables types with a ’smaller model’ using

only eﬀective KPI’s.

In this sense, our setting is that of dominant measurable (i.e. ’objective) variables vs. a set of signiﬁcant, but less

dominant latent (i.e. ’subjective’) variables. For example, while a variable such as ’risk-aﬃnity’ is a well-deﬁned

decision-theoretic quantity [42], it may depend on personality factors, such as ’openness to experience’ or ’neuroti-

cism’ in a complicated way which is, in the modeling approach, only susceptible to non-linear machine learning (’AI’)

methods. The speciﬁc dependency of a company’s success on ’risk-aﬃnity’ in the sense of a psychological proﬁle may

therefore be too complicated to be explicitly revealed. However, it may be hoped for that using variables of type 1, the

same success rate as the observed one can be produced by a simpler interpretable model in which only KPI-variables

are used (see Discussion).

Accounting for the moderators and making predictions based on the team level analysis, is what makes this re-

search/model unique and novel from 1) a psychological perspective and 2) in the start-up and venture capital ecosys-

tem. Prior research greatly points out that ’overanalyzing’ the individual to predict success in start-ups has some

downsides and at the same time that the team’s future performance is key. Looking at the prior mentioned factors at

an organizational or team level is rare in the literature as it is harder to collect data and requires greater resources. For

example, how a team approaches its development and tasks varies based on the context (volatile, stagnant, or gradually

changing), organization- al constraints (how much is the company driven by regulation versus market needs?), opera-

tional characteristics (multinational, government-run, or locally conﬁned), and the line of business in which the team

resides. Recent research suggests great potential, as studies found that successful teams were characterized by higher

levels of general cognitive ability, higher extraversion, higher agreeableness, and lower neuroticism than their unsuc-

cessful counterparts. In successful teams, the heterogeneity of conscientiousness was negatively related to increments

in product performance. [37]

Furthermore, in situations where it is desirable to predict how well a team will perform, it appears more valuable to

know the mean level of cognitive ability of members than the score of the highest or lowest scoring individual. Perhaps

the most important ﬁnding of this study, however, was the strong evidence of moderation aﬀecting the team-level

relationship for all three operational deﬁnitions associated with level. With regard to composing work groups, the main

analysis indicates that the relationship between mean cognitive ability and team performance varies across situations.

Few empirical studies in the literature have examined potential moderators, but theory and empirical research suggest a

potential role for task complexity. Given that task complexity moderates the relationship between cognitive ability and

performance for individuals (Hunter and Hunter [39]), team-level indices of cognitive ability may be more strongly

related to team performance on complex tasks than simple tasks.

Finally, regarding task familiarity, research by Kanfer and Ackerman [38] suggests the relationship between cog-

nitive ability and task performance decreases over time for individuals as they acquire more experience with a task.

Essentially, they argue that cognitive ability is important in the early stages of learning a new task but becomes pro-

gressively less important as knowledge is acquired and skills become proceduralized. Extending this argument to the

team level, the correlation between team-level cognitive ability and team performance should be highest for novel

tasks and should decrease over time (i.e., repetitions or cycles).

In summary, it is impossible to neglect the inﬂuence of psychological factors when estimating the likelihood of

success of small companies, which are in the process of establishing their position in the market. It is our intention to

provide an explainable modeling approach in which the ’subjective’ nature of latent variables is evaded by mapping

their eﬀect on a space in which only KPI’s are used. The eﬀective KPI’s are then ’explaining’ the predictions for

single instances (i.e. locally) what can globally only be modeled by including all types of variables, possibly only

with a non-linear, non-interpretable machine learning model.

3.2. Case Study: Success of newsvendors

In order to demonstrate how BAPC can be used to interpret the action of an otherwise non-interpretable machine

learning method, a caricature model is considered to generate synthetic data representing a start-up company’s actions

in the market. Namely, the paradigm of the ’newsvendor problem’, in which initially some amount qof stock (news-

papers) is bought at item cost c, which may or may not be sold entirely during the day for item prices p, depending

on the demand D, a random variable: If too much stock is acquired (D<q), the investment is not all turned into proﬁt

while if too little stock has been bought (q<D), less proﬁt is made than the prevailing demand allows.

6Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000

The newsvendor model [4,6] is one of the most often quoted paradigms in operations management. Its ﬁrst appear-

ance dates back to work from 1888 by Edgeworth [5] on the optimal amount of bank reserves. It has found numerous

applications in applied economics [16].

A recent discussion [4] reveals several diﬀerent approaches to solving the newsvendor problem. Among these the

method introduced by [7] considers a linear machine learning model, to which non-linear combinations of features

are added. As described in Section 2, BAPC follows a similar approach in order to explain the non-linear corrections

provided by a “non-interpretable” AI-model in terms of the interpretable parameters of the underlying base model.

3.3. The risk-aﬃne newsvendor

Similar to the speciﬁc newsvendor setting, let pbe the price at which an item a company produces can be sold, and

cits production cost. Let qbe the number of items produced in a unit time interval (say, a month). Furthermore, let

F(x)=P[D≤x] for x≥0 be the cumulative distribution function of a random variable D, representing the demand

in the considered ﬁrm’s product.

Let D:={Di}n−1

i=0be an i.i.d. n-sized random sample of demands, and b

Fnthe cumulative distribution function of the

empirical measure belonging to the sample. As it is well known [6], with Fbeing the exact cumulative distribution

function of all Di, the critical fractile q∗is the optimal amount qof stock maximising the proﬁt function π:R+×R+→

R, given by

π(q,D) :=p·(min(D,q))−c·q.(4)

This is the exact quantile function evaluated at 1 −c/p=(p−c)/p, the fraction of the cost of unmet demand and

unsold items. By b

q∗we denote the estimation of the critical fractile, given by

b

q∗:=b

F−1 1−c

p!.(5)

In order to use complete symbolism, these estimated quantities would have to be expressed in dependence of the

sample size n(like b

Fnetc.), as well as some element ωof the appropriate underlying incidence space Ω(like b

q∗(λ)

instead of b

q∗

n(λ, ω)). However, as it is clear that Dis a sample of ﬁxed size and the quantities depending on it are

(derived) random variables, we omit such details, for the sake of conciseness (see [3] for more formal detail).

3.4. Preparation of the artiﬁcial data set

In order to have a synthetic data set which carries the features we wish to detect with BAPC, we

1. choose ﬁxed constants cfor cost and pfor price,

2. draw a sample Dof observed demands Di,i∈ {0,...,n−1}, where nrepresents the number of months for which

the company has stored data,

3. for each idetermine the number of items produced: qi:=b

q∗

i+ri, where b

q∗is the estimated critical fractile of the

sample, and riis a deviation from the optimal b

q∗

iemergent to the speciﬁc (personal) strategy of the company’s

decision maker (expressing risk-aﬃnity or risk-aversion, cf. [4,16]),

4. and calculate the resulting proﬁts πi:=π(qi,Di).

5. We also calculate Si, the indicator for whether the company was successful in the i-th month, if π(qi,Di)>0

(indicated by Si=1), or whether there wasn’t an increase in proﬁt (Si=0), i.e. whether π(qi,Di)≤0.

It is obvious, that riare ’perturbations’ of the estimated critical fractile b

q∗, which we inject so as to have data for

which it is possible to correct by an additional (non-interpretable) machine learning model in Step 2.

3.5. BAPC applied to probabilistic regression

We now carry out the program suggested in Section 2.1. BAPC is a regression approach so that for binary classi-

ﬁcation it is necessary to consider a numerical variable to indicate the probability of success. For our base model we

Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 7

choose the (non-linear) probabilistic regression model fθ:R+→[0,1], where θ=λ, and fλ(Di)=b

Si(λ) (the ’hat’

indicates the estimation of q∗is involved) with the link function [44]

b

Si(λ)=1

|N(i)|X

D∈N(i)

1+(π(b

q∗(λ),D)),(6)

where 1+(·) is the indicator function for the positive real numbers R+and Ni={D∈D| |D−Di|< δ}. Note the

dependency on δwhich is suppressed in the notation for the sake of conciseness. We call b

Si(λ) a success indicator

and explicitly require that its values remain in [0,1]. As the non-interpretable AI-correction model Aη(D), we choose

the non-linear machine learning models ’rf’, and ’nnet’ from the R library ’caret’ [40]. The training data ηis the set

{hDi,Lgt(εi)i}n−1

i=0, with εi=Si−b

Si(λn) for i∈ {0,...,n−1}, and Lgt(x)=log(1 +x)/(1 −x) (logit-function scaled

to x∈[−1,1]). The parameter λnis determined in Step 1 and Step 3 by minimising the residual errors εiand ε0

i. Let

n=hε0, . . . , εn−1iTand 0

n=hε0

0, . . . , ε0

n−1iTbe the vectors of the residual errors of the respective training sets.

1. First base model application:

Sn=b

Sn(λn)+εn,where λn=arg min

λ

knk2.(7)

This means, b

Snis calculated and the residual error εnis recorded.

2. Application of the AI-correction:

With the data set {hDi,Lgt(εi)i}n−1

i=0, a random forest model (or neural network nnet) Aη(D) for the residuals

is trained as a function of D. The back-transformed estimate b

εnis truncated, if necessary, such that Sn−b

εn∈[0,1].

3. Second base model application:

Sn−b

εn=b

Sn(λ0

n)+ε0

n,where λ0

n=arg min

λ

k0

nk2.(8)

The base model is applied again, and a diﬀerent parameter λ0

nis obtained.

The diﬀerence-vector ∆λn:=λ0

n−λnis recorded and can be interpreted as the correction of Aη(Dn) applied to the

base model prediction Sn. It is important to note that the correction depends on Dnand therefore is to be interpreted

locally for each instance. For a diﬀerent value of Dnthere might not only be a diﬀerent magnitude for b

εn, but also a

diﬀerent interpretation ∆λn=λn−λ0

n. As discussed below, the local nature of BAPC also is reﬂected in aggregate

assessments, e.g. when diﬀerent values of δimply diﬀerent values of the success indicator.

4. Results and Discussion

4.1. Use Case: The exponentially distributed demand

For the sake of illustration we choose an exponential distribution with rate p·λfor the demand. This models the

declining demand with rising price by a linear function. It yields a critical qantile of q∗=1

p·λlog p

c. When sampling

the demand, we choose p=2,c=1, λ =1, and half of the values in Step 3 of the data-preparation process (Section

3.4) the perturbation rito be 0 (Black) and the other half to be equal to +1 (Blue) corresponding to a risk-favouring

order-quantity in which a higher maximal proﬁt is possible under the constraint of a smaller expected proﬁt (see

Figure 2, Top-Left for a sample of size n=100). The unperturbed and perturbed data is also plotted versus the

demand (Center) and versus the true success (Right). In the center picture, the critical quantile of about 0.34 is seen

to occur at the bend of the unperturbed (Black) curve. In the lower row of Figure 2, the correction (Left) by the

random forest or neural network model Aηgiving the estimated residual error b

εiis shown to be positive (exclusively)

for some unperturbed and negative for (exclusively) some perturbed values. Adding this correction to the success

estimator (Center: Black for unperturbed, and Blue for perturbed) yields values (Red), which help making successful

8Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000

Fig. 2. Top Row - Left: The true gain (=proﬁt) of a sample of 50 optimally chosen (Black) orders qi, together with 50 further perturbed orders (Blue)

chosen too high (ri= +1). Center: Same versus demand Di. Right: Same, versus True Success. Lower Row – Left: Performance of trained corrector

Aη. Center: Success estimator before and with added estimated correction (Red). They shift to the right for π(b

q(λ∗),Di)>0 and mostly to the left,

otherwise. Right: Fitted success estimator. The fact that more unperturbed successful companies (Black) are estimated to be more successful than

in the case of b

Si(λ) shows the successful application of the correction (’Before Correction’ has same eﬀect as ’After Correction’).

incidents to have a larger predicted success indicator and unsuccessful ones to have a lower prediction of success.

The condition on the vector of residuals kn(λ)k2of Step 1 (and k0

n(λ)k2, in Step 3) to be minimal and to respectively

determine λ∗, and λ0∗ (Figure 3, Left) results in a diﬀerence between these two parameters for which the histogram

of an experiment with 100 repetitions (shown in Figure 3, on the right) gives evidence of the typical parameter shift

from λ∗to λ0∗. It is seen to be downwards. A lower parameter λcorresponds to a higher critical quantile 1

λplog p

c. This

exhibits the original data containing the sub-optimal orders on average to perform like as if there was less demand

than in the corrected case. In this way, the interpretation of what the correction Aηachieves on average is delivered

in the form of the expected parameter shift in the underlying statistical model: It is a larger eﬀective demand.

Fig. 3. Left: Typical minimization result of kkand k0kyielding λ(Black) and λ0(Blue). Red curves are the Loess-smoothed residual errors

(span=0.25). The error k0kis typically larger, as it corresponds to the testing error, and kkto the training error. Center: Histograms of the

parameter shift λ0∗−λ∗for 100 iterations of applications of Monte Carlo Cross Validation with samples of size 100 for δ=0.1 for ’rf’ (Light

Blue), and ’nnet’ (Red). Right: The optimal neighborhood determining parameter δis found by minimisation of the standard deviation of the

parameter shift. The optimal δoccurring at this ’intermediate’ value shows the eminent role of the locality of BAPC: Using aggregate local success

estimations in a speciﬁc neighborhood of Diincreases precision versus using isolated point data for the training process (δ=0).

Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000 9

4.2. Discussion of the Experiment

After producing a data set of demands Di, order sizes qiand success indicators Siof size 2N, with half of the entries

of b

qi(λ) ’perturbed’ by the value ri, a random subset of size Nis chosen as a training set. We use stratiﬁed Monte Carlo

cross-validation [43] in which a random equal sized split with two folds with equal number of perturbed values in each

of the two folds is produced. One fold is used as a training set for the correction model, which is applied to the other

fold for which the prediction b

S(λ) and correction Aηis performed. This process is repeated ntimes and the average

of the results for λ0∗ −λ∗is reported. The neighbourhood generating parameter δhas been found to be optimal at 0.1

by means of variance minimisation of the empirical parameter shift distribution. The use of the statistical model of

parametrized distribution as a base model resembles very much the probit model for binary classiﬁcation. However,

the link function (here: b

Si(λ)) is not linear in λ[44]. The correcting model Aη(’rf’ or ’nnet’ form R’s caret library) is

trained on the values obtained from inserting the residual errors into the logit function. The prediction is transformed

back onto the original scale of the success indicators, so that b

i∈[−1,1]. This step from logit regression has proven to

be essential to obtain suﬃciently accurate estimates of λ0∗and λ∗. For the speciﬁc application of the method presented

here, it is left to be explained how the psychological proﬁle characteristics (2. to 7. in the list of Section 3.1) should

be used to model risk-welcoming behaviour. For this we refer to the discussion on ’neuroticism’ [41], or ’sociability’

[20] in connection with the newsvendor model.

5. Conclusion:

We have shown that explainability of a machine learning model acting as a corrector to a parametric base model

can be established in the framework of model-agnostic, local surrogates. The results are presented in terms of interpre-

tations which the parameter shifts of the base model provide. For the case of predicting success of start-up companies,

the method of BAPC connects data of psychological proﬁles (represented by the machine learning model) with ’hard’

KPI-business variables via the parameter changes of the base model. Our outlook is to apply BAPC to various pre-

dictive maintenance tasks on production data such as ’perturbed’ time-series of sensor readings. The ability to place

high-performing machine learning models (such as deep-learning) into the framework of a conventional base model

(such as probit regression) is highly attractive as in this case the base model is likely to emerge from the understanding

of the underlying physical processes.

Acknowledgements

This work has been supported by the project ’inAIco’ (FFG-Project No. 862019; Bridge Young Scientist, 2020),

as well as the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research

and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH.

References

[1] Turek, M. (2016). “Explainable Artiﬁcial Intelligence (XAI), DARPA-BAA-16-53”: Proposers Day Slides,

https://www.darpa.mil/program/explainable-artiﬁcial-intelligence.

[2] Filippini, Massimo, and Lester C. Hunt. (2011) “Energy demand and energy eﬃciency in the OECD countries: a stochastic demand frontier

approach.” Energy Journal 32 (2): 59–80.

[3] Sobieczky F. (2020), “A local surrogate model for explainable machine learning corrections using before and after estimation parameter

comparison (BAPC)”, (preprint)

[4] Schweitzer, M.E., and Cachon, G.P. (2000). “Decision bias in the newsvendor problem with a known demand distribution: Experimental

evidence”. Management Science 43 (3): 404–420.

[5] Edgeworth, F. Y. (1888). “The Mathematical Theory of Banking”. Journal of the Royal Statistical Society. 51 (1): 113–127

[6] Evan L. Porteus, Evan L. “The newsvendor problem”. In D. Chhajed and T. J. Lowe, editors,Building Intuition: Insights From Basic Operations

Management Models and Principles, chapter 7, 115–134. Springer, 2008.

[7] Cynthia Rudin and Gah-yi Vahn. “The big data newsvendor: Practical insights from machine learning analysis”. Cambridge, Mass.: MIT Sloan

School of Management, 2013.

[8] Doˇ

silovi´

c, F. K., Brˇ

ci´

c, M. and Hlupi´

c, N. “Explainable artiﬁcial intelligence: A survey,” in 2018 41st International Convention on Information

and Communication Technology, Electronics and Microelectronics (MIPRO), 2018.

10 Neugebauer, Rippitsch, Sobieczky /Procedia Computer Science 00 (2019) 000–000

[9] A. John (2019), “Artiﬁcial Intelligence and Black-Box Medical Decisions: Accuracy versus Explainability,“ Hastings Center Report, Bd. 49,

Nr. 1, 15–21.

[10] Carvalho, D. V., Pereira E. M., and Cardoso J. S. (2019) “Machine Learning Interpretability: A Survey on Methods and Metrics,“ Electronics,

Bd. 8, Nr. 8, p. 832.

[11] A. B.Arrieta, N. D´

ıaz-Rodr´

ıguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garc´

ıa, S. Gil-L´

opez, D. Molina, R. Benjamins, R. Chatila,

F. Herrera (2020). “XAI: Concepts, taxonomies, opportunities and challenges toward responsible AI”, Jour. Inf. Fusion 58, p. 82-115

[12] A. Diez-Olivan, J. Del Ser, D. Galar, B. Sierra (2019) Data fusion and machine learning for industrial prognosis: Trends and perspectives

towards Industry 4.0 Information Fusion, 50 , pp. 92-111

[13] Doloi H. (2009) “Analysis of pre-qualiﬁcation criteria in contractor selection and their impacts on project success”, Construction Management

and Economics, 27:12, 1245–1263

[14] Altman, E. I. (1968): “Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy”, J. of Fin., Bd. 23 (4), p. 589–610.

[15] Molnar C. (2019), “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable”, e-book, leanpub.com, Chap. 5.7.

[16] Yan Q., Ruoxuan W., Asoo J. V., Yuwen C., Michelle M. H. S. (2011) “The newsvendor problem: Review and directions for future research”,

European Journal of Operational Research 213, 361–374

[17] Batey M., T. Chamorro-Premuzic, T., and Furnham A., (2009) “Intelligence and personality as predictors of divergent thinking: The role of

general, ﬂuid and crystallised intelligence,“ Thinking Skills and Creativity, Bd. 4, Nr. 1.

[18] Ones, D. S., Dilchert, S., Chokalingham V., and Judge T. A. (2007) “In support of personality assessment in organizational settings,“ Personnel

Psychology, Bd. 60, Nr. 4, p. 33.

[19] Kerr W.,Nanda R., and Rhodes-Kropf M. (2014) “Entrepreneurship as Experimentation,“ J. of Econ. Perspectives, Bd. 28, Nr. 3, pp. 25–48.

[20] Yhao H., Seibert S., and Lumpkin G. (2006) “The Big Five personality dimensions and entrepreneurial status: A meta analytical review,“

Journal of Applied Psychology, Bd. 91, Nr. 2, 259–271.

[21] R. Levine und Y. Rubinstein, (2018) “Selection into Entrepreneurship and Self-Employment,“ doi:10.3386-w25350.

[22] Manso G., (2016) “Experimentation and the Returns to Entrepreneurship,“ The Review of Financial Studies, Bd. 29, Nr. 9, 2319–2340.

[23] Kalleberg A., and Leicht K., (1991) “Gender and Organizational Performance: Determinants of Small Business Survival and Success,“

Academy of Management Journal, Bd. 34, Nr. 1, 136–161.

[24] Shaver K., and Scott L., (1992) “Person, Process, Choice: The Psychology of New Venture Creation,“ Entrepreneurship Theory and Practice,

Bd. 16, Nr. 2, 23–46.

[25] Helmers C., and Rogers M., (2019) “Innovation and the Survival of New Firms in the UK,“ Review of Industrial Organization, Bd. 36, Nr. 3,

227–248.

[26] Quatraro F., and Vivarelli M., (2014) “Drivers of Entrepreneurship and Post-entry Performance of Newborn Firms in Developing Countries,“

The World Bank Research Observer, Bd. 30, Nr. 2, 277–305.

[27] Astebro T., and Chen J., (2014) “The entrepreneurial earnings puzzle: Mismeasurement or real,“ Journal of Business Venturing, Bd. 29, Nr. 1,

88–105.

[28] Bernardo A., and Welch I., (2001) “On the Evolution of Overconﬁdence and Entrepreneurs,“ Journal of Economics and Management Strategy,

Bd. 10, Nr. 3, 301–330.

[29] Koellinger P., Minniti M., and Schade C., (2007) “ ’I think I can, I think I can’: Overconﬁdence and entrepreneurial behavior,“ Journal of

Economic Psychology, Bd. 28, Nr. 4, 502–527.

[30] Caliendo M., Fossen F., and Kritikos A., (2014) “Personality characteristics and the decisions to become and stay self-employed,“ Small

Business Economics, Bd. 42, Nr. 4, 787–814.

[31] Ahmetoglu G., Leutner F., and Chamorro-Premuzic T., (2011) “EQ-nomics: Understanding the relationship between individual diﬀerences in

Trait Emotional Intelligence and entrepreneurship,“ Personality and Individual Diﬀerences, Bd. 51, Nr. 8, 1028–1033.

[32] P. Almeida, G. Ahmetoglu und T. Chamorro-Premuzic, “Who Wants to Be an Entrepreneur? The Relationship Between Vocational Interests

and Individual Diﬀerences in Entrepreneurship,“ Journal of Career Assessment, Bd. 22, Nr. 1, pp. 102-112, 2013.

[33] Leutner F., Ahmetoglu G., Akhtar R., and Chamorro-Premuzic , (2014) “The relationship between the entrepreneurial personality and the Big

Five personality traits,“ Personality and Individual Diﬀerence, Bd. 63, 58–63.

[34] Rauch A., and Frese M., (2009) “A Personality Approach to Entrepreneurship”, doi: 10.1093/oxfordhb/9780199234738.003.0006.

[35] Collins C., Hanges P. J., and Locke E., (2004) “The Relationship of Achievement Motivation to Entrepreneurial Behavior: A Meta-Analysis“,

Human Percormance, Bd. 17, Nr. 1, 95–117.

[36] Borghans L., Duckworth A. L., Heckman J. J., ter Weel B., (2008) “The Economics and Psychology of Personality Traits”, The Journal of

Human Resources, 43, Nr. 4, 972–1059.

[37] Kichuk S. L., Wiesner W. H., (1997) “The big ﬁve personality factors and team performance: implications for selecting successful product

design teams”, Journal of Engineering and Technology Management, Vol. 14; Iss. 3-4.

[38] Kanfer, R., and Ackerman, P. L. (1989). “Motivation and cognitive abilities: An integrative/aptitude-treatment interaction approach to skill

acquisition”, Journal of Applied Psychology, 74(4), 657–690.

[39] Hunter, J. E., and Hunter, R. F. (1984). “Validity and utility of alternative predictors of job performance”, Psychological Bulletin, 96(1), 72–98.

[40] Kuhn, M. (2008). “Building Predictive Models in R Using the caret Package’, Journal of Statistical Software, 28(5), 1 - 26.

doi:http://dx.doi.org/10.18637/jss.v028.i05

[41] Harrison J. S., Thurgood G. R., Boivie S., Pfarrer M. D. (2020) “Perception Is Reality: How CEOs’ Observed Personality Inﬂuences Market

Perceptions of Firm Risk and Shareholder Returns”, Acad. of Management Jour. 63, No. 4

[42] Bartholomae F., Wiens M. (2016) “Spieltheorie: Ein anwendungsorientiertes Lehrbuch”, p. 11

[43] Dubitzky W., Granzow M., Berrar D. (2007) “Fundamentals of data mining in genomics and proteomics”, Springer Sc. & Bus. Med. p. 178.

[44] Rencher A. C., Schaalje G. B. (2007) “Linear Models in Statistics”, doi.10.1002/9780470192610, Chap. 18, p. 507-516