Content uploaded by Jasone Ramirez-Ayerbe

Author content

All content in this area was uploaded by Jasone Ramirez-Ayerbe on Jul 04, 2023

Content may be subject to copyright.

Counterfactual Analysis in Benchmarking

Peter Bogetoft

*

1, Jasone Ram´ırez-Ayerbe

2, and Dolores Romero Morales

1

1Department of Economics, Copenhagen Business School, Frederiksberg, Denmark

2Instituto de Matem´aticas de la Universidad de Sevilla, Seville, Spain

Abstract

Conventional benchmarking based on simple key performance indicators (KPIs) is widely used

and easy to understand. However, simple KPIs cannot fully capture the complex relationship be-

tween multiple inputs and outputs in most ﬁrms. Data Envelopment Analysis (DEA) oﬀers an

attractive alternative. It builds an activity analysis model of best practices considering the multiple

inputs used and products and services produced. This allows more substantial evaluations and also

oﬀers a framework that can support many other operational, tactical and strategic planning eﬀorts.

Unfortunately, a DEA model may be hard to understand by managers. In turn, this may lead to

mistrust in the model, and to diﬃculties in deriving actionable information from the model beyond

the eﬃciency scores.

In this paper, we propose the use of counterfactual analysis to overcome these problems. We deﬁne

DEA counterfactual instances as alternative combinations of inputs and outputs that are close to

the original inputs and outputs of the ﬁrm and lead to desired improvements in its performance.

We formulate the problem of ﬁnding counterfactual explanations in DEA as a bilevel optimization

model. For a rich class of cost functions, reﬂecting the eﬀort an ineﬃcient ﬁrm will need to spend to

change to its counterfactual, ﬁnding counterfactual explanations boils down to solving Mixed Integer

Convex Quadratic Problems with linear constraints. We illustrate our approach using both a small

numerical example and a real-world dataset on banking branches.

Keywords— Data Envelopment Analysis; Benchmarking; Counterfactual Explanations; Bilevel Op-

timization

1 Introduction

In surveys among business managers, benchmarking is consistently ranked as one of the most popular

management tools (Rigby, 2015; Rigby and Bilodeau, 2015). The core of benchmarking is relative

performance evaluation. The performance of one entity is compared to that of a group of other entities.

The evaluated “entity” can be a ﬁrm, organization, manager, product or process, such as a school, a

hospital, or a distribution system operator. In the following, it will be referred to simply as a Decision

*

Peter Bogetoft: pb.eco@cbs.dk

Jasone Ram´ırez-Ayerbe: mrayerbe@us.es

Dolores Romero Morales: drm.eco@cbs.dk

1

Making Unit (DMU). These comparisons can serve additional purposes, such as, facilitating learning,

decision making and incentive design.

There are many benchmarking approaches. Some are very simple and rely on the comparison of a

DMU’s Key Performance Indicators (KPIs) to those of a selected peer group of DMUs. The KPIs are

often just a ratio of one input to one output. This makes KPI based benchmarking easy to understand.

However, there are obvious drawbacks of such simple KPIs. In particular they tend to ignore the role

of other inputs and outputs in real DMUs. More advanced benchmarking approaches have therefore

emerged. They rely on frontier models using mathematical programming, e.g., Data Envelopment Anal-

ysis (DEA), and Econometrics, e.g., Stochastic Frontier Analysis (SFA), to explicitly model the complex

interaction between the multiple inputs and outputs among best-practice DMUs. There are by now many

textbooks on DEA and SFA based methods, cf. e.g. Bogetoft and Otto (2011); Charnes et al. (1995);

Parmeter and Zelenyuk (2019); Zhu (2016).

In this paper we focus on DEA based benchmarking. To construct the best practice performance

frontier and evaluate the eﬃciency of a DMU relative to this frontier, DEA introduces a minimum of

production economic regularities, typically convexity, and uses linear or mixed integer programming to

capture the relationship between multiple inputs and outputs of a DMU and to select the relevant ones,

see Ben´ıtez-Pe˜na et al. (2020); Esteve et al. (2023); Monge and Ruiz (2023); Peyrache et al. (2020)

and references therein. In this sense, and in the eyes of the modeller, the method is well-deﬁned and

several of the properties of the model will be understandable from the production economic regularities.

Still, from the point of view of the evaluated DMUs, the model will appear very much like a black

box. Understanding a multiple input and multiple output structure is basically diﬃcult. Also, in DEA,

there is no explicit formula showing the impact of speciﬁc inputs on speciﬁc outputs as in SFA or

other econometrics based approaches. This has led some researchers to look for extra information and

structure of DEA models, most notably by viewing the black box as a network of more speciﬁc process, cf.

e.g. Cherchye et al. (2013); F¨are and Grosskopf (2000); Kao (2009). The black box nature of DEA models

may lead to some algorithm aversion and mistrust in the model, and to diﬃculties in deriving actionable

information from the model beyond the eﬃciency scores. To overcome this and to get insights into the

functioning of a DEA model, it may therefore be interesting to look for counterfactual explanations much

like they are used in machine learning.

In interpretable machine learning (Du et al., 2019; Rudin et al., 2022), counterfactual analysis is used

to explain the predictions made by black box models for individual instances (Guidotti, 2022; Karimi

et al., 2022; Wachter et al., 2017). Given an instance that has been predicted an undesired outcome,

a counterfactual explanation is a close instance that is predicted the desired outcome. It may help

understand for example how to ﬂip the predicted class of a credit application from rejected (undesired)

2

to accepted (desired), where closeness can be deﬁned through the magnitude of the changes incurred

or the number of features that need to be perturbed. Counterfactual explanations are human-friendly

explanations that describe the changes in the current values of the features to achieve the desired outcome,

being thus contrastive to the current instance. Moreover, when the closeness penalizes the number of

features perturbed, the counterfactual explanations focus on a small set of features.

In this paper, we propose the use of counterfactual analysis to understand and explain the eﬃciencies

of individual DMUs as well as to learn about the estimated best practice technology. The DEA approach

builds on comprehensive multiple-input, multiple-output relations estimated from actual practices. The

level of complexity that such benchmarking models can capture vastly exceeds those of mental models

and textbook examples. A benchmarking model therefore not only allows us to make substantiated eval-

uations of the past performances of individual DMUs. The framework, and in particular the underlying

model of the technology, also allows us to answer a series of what-if questions that are very useful in

operational, tactical and strategic planning eﬀorts (Bogetoft, 2012).

In a DEA context, counterfactual analysis can help with learning, decision making and incentive

design. In terms of learning, the DMU may be interested to know what simple changes in features (inputs

and outputs) lead to a higher eﬃciency level. In the application investigated in our numerical section,

this can be, for instance, how many credit oﬃcers or tellers a bank branch should remove to become

fully eﬃcient. This may help the evaluated DMU learn about and gain trust in the underlying modelling

of the best practice performance relationships. In terms of decision making, counterfactual explanations

may help guide the decision process by oﬀering the smallest, the most plausible and actionable, and the

least costly changes that lead to a desired boost in performance. It depends on the context how to deﬁne

the least costly, or the most plausible or actionable improvement paths. In some cases it may be easier

to reduce all inputs more or less the same (lawn mowing), while in other cases certain inputs should be

reduced more aggressively than others, cf. Antle and Bogetoft (2019). Referring back to the application

in the numerical section, reducing the use of diﬀerent labor inputs could for example take into account the

power of diﬀerent labor unions and the capacity of management to struggle with multiple employee groups

simultaneously. Likewise, it may be useful to learn the value of extra inputs or the tradeoﬀs between

multiple outputs since this can inform scaling and scope decisions. Lastly, counterfactual explanations

may be useful in connection with incentive provisions. DEA models are routinely used by regulators of

natural monopoly networks to incentivise cost reductions and service improvement. For an overview of

benchmarking-based regulations, see Haney and Pollitt (2009) and later updates in Agrell and Bogetoft

(2017); Bogetoft (2012). The regulated ﬁrms in such settings will naturally look for the easiest way to

accommodate the regulator’s targets. Counterfactual explanations may in such cases serve to guide the

optimal strategic responses to the regulator’s model based requirements.

3

Unfortunately, it is not an entirely trivial task to construct counterfactual explanations in a DEA

context. Besides allowing the DEA algorithm to interact with the DMUs strategies, we need to ﬁnd

alternative solutions that are in some sense close to the existing input-output combination used by a

DMU. This involves ﬁnding “close” alternatives in the complement of a convex set (Thach, 1988). In

this paper, we show how to formulate as bilevel optimization models the problem to determine “close”

counterfactual explanations in DEA models that lead to desired relative performance levels and also take

into account the strategic preferences of the entity. We investigate diﬀerent ways to measure the closeness

between a DMU and its counterfactual DMU, using, in particular, the ℓ0,ℓ1as well as ℓ2norms. We

consider both changes in input and output features and show how to formulate the problems in DEA

models with diﬀerent returns to scale assumptions. We hereby provide a set of formulations and results

that facilitate the construction of DMU relevant counterfactual explanations in a DEA context. We also

illustrate our approach on both a small numerical example as well as a large scale real-world dataset

involving bank branches.

The outline of the paper is as follows. In Section 2 we review the relevant literature. In Section 3 we

introduce the necessary DEA notation for constructing counterfactual explanations, as well as a small

numerical example. In Section 4 we describe our bilevel optimization formulation for the counterfactual

problem in DEA and its reformulation as a Mixed Integer Convex Quadratic Problem with linear con-

straints. In Section 5 we discuss extensions of this model. In Section 6 we illustrate our approach with

real-world data on bank branches. We end the paper with conclusions in Section 7.

2 Background and Literature

In this section, we give some background on DEA benchmarking, in particular on directional and inter-

active benchmarking, as well as counterfactual analysis for interpretable machine learning.

Data Envelopment Analysis, DEA, was ﬁrst introduced in Charnes et al. (1978, 1979) as a tool for

measuring eﬃciency and productivity of decision making units, DMUs. The idea of DEA is to model the

production possibilities of the DMUs and to measure the performance of the individual DMUs relative

to the production possibility frontier. The modelling is based on observed practices that form activities

in a Linear Programming (LP) based activity analysis model.

Most studies use DEA models primarily to measure the relative eﬃciency of the DMUs. The bench-

marking framework, the LP based activity analysis model, does however allow us to explore a series of

other questions. In fact, the benchmarking framework can serve as a learning lab and decision support

tool for managers.

In the DEA literature, this perspective has been emphasized by the idea of interactive benchmarking.

4

Interactive benchmarking and associated easy to use software has been used in a series of applications

and consultancy projects, cf. e.g. Bogetoft (2012). The idea is that a DMU can search for alternative and

attractive production possibilities and hereby learn about the technology, explore possible changes and

trade-oﬀs and look for least cost changes that allow for necessary performance improvements, cf. also

our discussion of learning, decision making and incentive and regulation applications in the introduction.

One way to illustrate the idea of interactive benchmarking is as in Figure 1 below. A DMU has used two

inputs to produce two outputs. Based on the data from other DMUs, an estimate of the best practice

technology has been established as illustrated by the piecewise linear input and output isoquants. The

DMU may now be interested in exploring alternative paths towards best practices. One possibility is to

save a lot of input 2 and somewhat less of input 1, i.e., to move in the direction dxillustrated by the

arrow in the left panel. If the aim is to become fully eﬃcient, this approach suggests that the DMU

instead of the present (input,output) combination (x,y) should consider the alternative (ˆx,y). A similar

logic could be used on the output side keeping the inputs ﬁxed as illustrated in the right panel where we

assume that more of a proportional increase in the two outputs is strived at. Of course, in reality, one

can combine also changes in the inputs and outputs.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................................................................................................................................................................................................................................................................................

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

Input 1

Input 2

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

....................................................................................

•

•

x

ˆx=x−edx

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.−dx

Input side

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................................................................................................................................................................................................................................................................................

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

Output 1

Output 2

0.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................

•

•

y

ˆy=y+edy

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

dy

Output side

Figure 1: Directional search for an alternative production plan to (x,y) along (dx,dy) using (DIR)

Formally, the directional distance function approach, sometimes referred to as the excess problem,

requires solving the following mathematical programming problem

max

ee(DIR)

s.t. (x−edx,y+edy)∈T∗,

where xand yare the present values of the inputs and output vectors, dxand dyare the improvement

5

directions in input and output space, T∗is the estimated set of feasible (input,output) combinations,

and eis the magnitude of the movement.

In the DEA literature, the direction (dx,dy) is often thought as parameters that are given and the

excess as one of many possible ways to measure distance to the frontier. A few authors have advocated

that some directions are more natural than others and there have been attempts to endogenize the choice

of this direction, cf. e.g. Bogetoft and Hougaard (1999); F¨are et al. (2017); Petersen (2018). One can

also think of the improvement directions as reﬂecting the underlying strategy of the DMU or simply as

a steering tool that the DMU uses to create one or more interesting points on the frontier.

Figure 2 illustrates the real-world example involving bank branches from the application section.

The analysis is here done using the directional distance function approach (DIR) as implemented in the

so-called Interactive Benchmarking software, cf. Bogetoft (2012). The search “Direction” is chosen by

adjusting the horizontal handles for each input and output and is expressed in percentages of the existing

inputs and outputs. The resulting best practice alternative is illustrated in the “Benchmark” column.

We see that the DMU in this example expresses an interest in reducing Supervision and Credit personnel

but simultaneously seeks to increase the number of personal loan accounts.

Figure 2: Directional search in Interactive Benchmarking software. Real-world dataset of bank branches

Applications of interactive benchmarking have typically been in settings where the DMU in a trial-

and-error like process seeks alternative production plans. Such processes can certainly be useful in

attempts to learn about and gain trust in the modelling, to guide decision making and to ﬁnd the

perfomance enhancing changes that a DMU may ﬁnd relatively easy to implement. From the point of view

of Multiple Criteria Decision Making (MCDM) we can think of such processes as based on progressive

articulation of preferences and alternative, cf. e.g. the taxonomy of MCDM methods suggested in Rostami

et al. (2017).

It is clear from this small example, however, that the use of an interactive process guided solely by

the DMU may not always be the best approach. If there are more than a few inputs and outputs, the

process can become diﬃcult to steer towards some underlying optimal compromise between the many

6

possible changes in inputs and outputs. In such cases, the so-called prior articulation of preferences may

be more useful. If the DMU can express its preferences for diﬀerent changes, e.g., as a (change) cost

function C((x,y),(x∗,y∗)) giving the cost of moving from the present production plan (x,y) to any

new production plan (x∗,y∗), then a systematic search for the optimal change is possible. The approach

of this paper is based on this idea. We consider a class of cost functions and show how to ﬁnd optimal

changes in inputs and outputs using bilevel optimization. In this sense, it corresponds to endogenizing

the directional choice so as to make the necessary changes in inputs and outputs as small as possible. Of

course, by varying the parameters of the cost function, one can also generate a reasonably representative

set of alternative production plans that the DMU can then choose from. This would correspond to the

idea of a prior articulation of alternatives approach in the MCDM taxonomy.

Machine learning approaches like Deep Learning, Random Forests, Support Vector Machines, and

XGBoost are often seen as powerful tools in terms of learning accuracy but also as black boxes in terms

of how the model arrives at its outcome. Therefore, regulations from, among others the EU, are en-

forcing more transparency in the so-called ﬁeld of algorithmic decision making (European Commission,

2020; Goodman and Flaxman, 2017). There is a paramount of tools being developed in the nascent

ﬁeld of explainable artiﬁcial intelligence to help understand how tools in machine learning and artiﬁcial

intelligence make decisions (Lundberg et al., 2020; Lundberg and Lee, 2017; Martens and Provost, 2014;

Molnar et al., 2020). In particular, in counterfactual analysis, given an individual instance which has

been predicted a undesired outcome, one is interested in building an alternative instance, the so-called

counterfactual instance, revealing how to change the features of the current instance so that the coun-

terfactual instance is predicted the desired outcome. The counterfactual explanation problem is written

as a mathematical optimization problem. To deﬁne the problem, one needs to model the feasible space,

a cost function measuring the cost of the movement from the current instance to the counterfactual one,

and a set of constraints that ensures that the counterfactual explanation is predicted with the desired

outcome. In general, the counterfactual explanation problem reads as a constrained nonlinear problem

but, for score-based classiﬁers and cost functions deﬁned by a convex combination of the norms ℓ0,ℓ1

and ℓ2, equivalent Mixed Integer Linear Programming or Mixed Integer Convex Quadratic with Lin-

ear Constraints formulations can be deﬁned, see, e.g., Carrizosa et al. (2021); Fischetti and Jo (2018);

Parmentier and Vidal (2021).

In the following, we combine the idea of DEA with the idea of counterfactual explanations. Techni-

cally, this leads us to formulate and solve bilevel optimization models to determine “close” counterfactual

explanations in DEA models that lead to desired relative performance levels and also take into account

the strategic preferences of the entity. We also illustrate our approach on both a small numerical example

and a large scale real-world problem.

7

3 The Setting

We consider K+ 1 DMUs (indexed by k), using Iinputs, xk= (xk

1, . . . , xk

I)⊤∈RI

+, to produce O

outputs, yk= (yk

1, . . . , yk

O)⊤∈RO

+. Hereafter, we will write (xk,yk) to refer to production plan of DMU

k,k= 0,1, . . . , K.

Let Tbe the technology set, with

T={(x,y)∈RI

+×RO

+|xcan produce y}.

We will initially estimate Tby the classical DEA model. It determines the empirical reference technology

T∗as the smallest subset of RI

+×RO

+that contains the actual K+1 observations, and satisﬁes the classical

DEA regularities of convexity, free-disposability in inputs and outputs, and Constant Returns to Scale

(CRS). It is easy to see that the estimated technology can be described as:

T∗(CRS) = {(x,y)∈RI

+×RO

+| ∃λ∈RK+1

+:x≥

K

X

k=0

λkxk,y≤

K

X

k=0

λkyk}.

To measure the eﬃciency of a ﬁrm, we will initially use the so-called Farrell input-oriented eﬃciency.

It measures the eﬃciency of a DMU, say DMU 0, as the largest proportional reduction E0of all its

inputs x0that allows the production of its present outputs y0in the technology T∗. Hence, it is equal

to the optimal solution value of the following LP formulation

min

E,λ0,...,λKE(DEA)

s.t. Ex0≥

K

X

k=0

λkxk

y0≤

K

X

k=0

λkyk

0≤E≤1

λ∈RK+1

+.

This DEA model has K+ 2 decision variables, Ilinear input constraints and Olinear output constraints.

Hereafter, we will refer to the optimal objective value of (DEA), say E0, as the eﬃciency of DMU 0.

In the following, and assuming that ﬁrm 0 with production plan (x0,y0) is not fully eﬃcient, E0<1,

we will show how to calculate a counterfactual explanation with a desired eﬃciency level E∗> E0,

i.e., the minimum changes needed in the inputs of the ﬁrm, x0, in order to obtain an eﬃciency E∗.

Given a cost function C(x0,ˆ

x) that measures the cost of moving from the present inputs x0to the new

8

counterfactual inputs ˆ

x, and a set X(x0) deﬁning the feasible space for ˆ

x, the counterfactual explanation

for x0is found solving the following optimization problem:

min

ˆ

xC(x0,ˆ

x)

s.t. ˆ

x∈ X(x0)

(ˆ

x,y0) has at least an eﬃciency of E∗.

With respect to C(x0,ˆ

x), diﬀerent norms can be used to measure the diﬃculty of changing the inputs.

A DMU may, for example, be interested to minimize the sum of the squared deviations between the

present and the counterfactual inputs. We model this using the squared Euclidean norm ℓ2

2. Likewise,

there may be an interest in minimizing the absolute value of the deviations, which we can proxy using

the ℓ1norm, or the number of inputs changed, which we can capture with the ℓ0norm. When it comes

to X(x0), this would include the nonnegativity of ˆ

x, as well as domain knowledge speciﬁc constraints.

With this approach, we detect the most important inputs in terms of the impact they have on the DMU’s

eﬃciency, and with enough ﬂexibility to consider diﬀerent costs of changing depending on the DMU’s

characteristics.

In the next section, we will show that ﬁnding counterfactual explanations involves solving a bilevel

optimization problem of minimizing the changes in inputs and solving the above DEA problem at the

same time. In Section 5, we will also discuss how the counterfactual analysis approach can be extended

to other technologies and to other eﬃciency measures like the output-oriented Farrell eﬃciency and other

DEA technologies.

Before turning to the details of the bilevel optimization problem, it is useful to illustrate the ideas

using a small numerical example. Suppose we have four ﬁrms with the inputs, outputs, and Farrell input

eﬃciencies as in Table 1. The eﬃciency has been calculated solving the classical DEA model with CRS,

namely (DEA). In this example, ﬁrms 1 and 2 are fully eﬃcient, whereas ﬁrms 3 and 4 are not.

Firm x1x2y E

1 0.50 1 1 1

2 1.50 0.50 1 1

3 1.75 1.25 1 0.59

4 2.50 1.25 1 0.50

Table 1: Inputs, outputs and corresponding Farrell input-eﬃciency of 4 diﬀerent ﬁrms

First, we want to know the changes needed in x3for ﬁrm 3 to have a new eﬃciency E∗of at least

80%. Since we only have two inputs, we can illustrate this graphically as in Figure 3a. The results are

shown in Table 2 for diﬀerent cost functions. It can be seen that we in all cases get exactly 80% eﬃciency

9

with the new inputs. We see from column ℓ2

2that the Farrell solution is further away from the original

inputs than the counterfactual solution based on the Euclidean norm. To the extent that diﬃculties of

change is captured by the ℓ2

2norm, we can conclude that the Farrell solution is not ideal. Moreover,

in the Farrell solution one must by deﬁnition change both inputs, see column ℓ0. Using a cost function

combining the ℓ0norm and the squared Euclidean norm, denoted by ℓ0+ℓ2, one penalizes the number

of inputs changed. With this we detect the one input that should change in order to obtain a higher

eﬃciency, namely the second input.

Cost function ˆx1ˆx2y E ℓ2

2ℓ0

Farrell 1.29 0.92 1 0.8 0.32 2

ℓ0+ℓ21.75 0.69 1 0.8 0.31 1

ℓ21.53 0.80 1 0.8 0.25 2

Table 2: Counterfactual explanations for ﬁrm 3 in Table 1 imposing E∗= 0.8 and diﬀerent cost functions

Let us now focus on ﬁrm 4 and again ﬁnd a counterfactual instance with at least 80% eﬃciency. The

results are shown in Table 3 and Figure 3b. Notice how in the Farrell case one obtains again the farthest

solution and also the least sparse from the three of them. As for the counterfactual explanations with

our methodology, the inputs nearest to the original DMU that give us the desired eﬃciency are in a

non full-facet of the eﬃciency frontier. Therefore, we only need to change one input, namely, the second

input.

Cost function ˆx1ˆx2yEℓ2

2ℓ0

Farrell 1.56 0.78 1 0.8 1.10 2

ℓ0+ℓ22.50 0.63 1 0.8 0.39 1

ℓ22.50 0.63 1 0.8 0.39 1

Table 3: Counterfactual explanations for ﬁrm 4 in Table 1 imposing E∗= 0.8 and diﬀerent cost functions

In both ﬁgures, the space where we search for the counterfactual explanation is shaded. As it has

been commented before it involves ﬁnding “close” inputs in the complement of a convex set, which we

will tackle with bilevel optimization in the next section.

4 Bilevel optimization for counterfactual analysis in DEA

Suppose DMU 0 is not fully eﬃcient, i.e., the optimal objective value of Problem (DEA) is E0<1.

In this section, we formulate the counterfactual explanation problem in DEA, i.e., the problem that

calculates the minimum cost changes in the inputs x0that make DMU 0 have a higher eﬃciency. Let

ˆ

xbe the new inputs of DMU 0 that would make it at least E∗eﬃcient, with E∗> E0. With this, we

have deﬁned the counterfactual instance as the one obtained changing the inputs, but in the same sense,

10

(a) Explanations for ﬁrm 3 (b) Explanations for ﬁrm 4

Figure 3: Counterfactual explanations for ﬁrms 3 and 4 in Tables 2 and 3 respectively imposing E∗= 0.8

and diﬀerent cost functions

we could deﬁne it by changing the outputs. This alternative output-based problem will be studied in

Section 5.

Since the values of the inputs are to be changed, the eﬃciency of the new production plan (ˆ

x,y0)

has to be calculated using Problem (DEA). The counterfactual explanation problem in DEA reads as

follows:

min

ˆxC(x0,ˆx) (1)

s.t. ˆx∈RI

+(2)

E≥E∗(3)

E∈arg min

¯

E,λ0,...,λK

{¯

E:¯

Eˆ

x≥

K

X

k=0

λkxk,y0≤

K

X

k=0

λkyk,(4)

¯

E≥0,λ∈RK+1

+},(5)

where we in the upper level problem in (1) minimize the cost of changing the inputs for ﬁrm 0, x0, to

ˆ

x, ensuring nonnegativity of the inputs, as in constraint (2), and that the eﬃciency is at least E∗, as

in constraint (3). The lower level problem in (4)-(5) ensures that the eﬃciency of (ˆ

x,y0) is correctly

calculated. Therefore, as opposed to counterfactual analysis in interpretable machine learning, here we

are confronted with a bilevel optimization problem. Notice also that to calculate the eﬃciency in the

lower level problem in (4)-(5), the technology is already ﬁxed, and the new DMU (ˆ

x,y0) does not take

11

part in its calculation.

In what follows, we reformulate the bilevel optimization problem (1)-(5) as a single-level model, by

exploiting the optimality conditions for the lower-level problem. This can be done for convex lower-level

problems that satisfy Slater’s conditions, e.g., if our lower-level problem was linear. In our case, however,

not all the constraints are linear, since in (4) we have the product of decision variables ¯

Eˆ

x. To be able

to handle this, we deﬁne new decision variables, namely, F=1

Eand βk=λk

E, for k= 0, . . . , K. Thus,

(1)-(5) is equivalent to:

min

ˆx,F C(x0,ˆx)

s.t. ˆx∈RI

+

F≤F∗

F∈arg max

¯

F ,β

{¯

F:ˆ

x≥

K

X

k=0

βkxk,¯

Fy0≤

K

X

k=0

βkyk,(6)

¯

F≥0,β∈RK+1

+}.(7)

This equivalent bilevel opimization problem can now be reformulated as a single-level model. The

new lower-level problem in (6)-(7) can be seen as the ˆ

x-parametrized problem:

max

F,βF(8)

s.t. ˆ

x≥

K

X

k=0

βkxk(9)

Fy0≤

K

X

k=0

βkyk(10)

F≥0 (11)

β≥0.(12)

The Karush-Kuhn-Tucker (KKT) conditions, which include primal and dual feasibility, stationarity

and complementarity conditions, are necessary and suﬃcient to characterize an optimal solution. Thus,

we can replace problem (6)-(7) by its KKT conditions. Primal feasibility is given by (9)-(12). Dual

feasibility is given by:

γI,γO, δ, µ≥0,(13)

for the dual variables associated with constraints (9)-(12), where γI∈RI

+,γO∈RO

+,δ∈R+,µ∈RK+1

+.

12

The stationarity conditions are as follows:

γ⊤

Oy0−δ= 1 (14)

γ⊤

Ixk−γ⊤

Oyk−µk= 0 k= 0, . . . , K. (15)

Lastly, we need the complementarity conditions for all constraints (9)-(12). For constraint (9), we have:

γi

I= 0 or ˆxi−

K

X

k=0

βkxk

i= 0 i= 1, . . . , I. (16)

In order to model this disjunction, we will introduce binary variables ui∈ {0,1},i= 1, . . . , I, and the

following constraints using the big-M method:

γi

I≤MIui,ˆxi−

K

X

k=0

βkxk

i≤MI(1 −ui), i = 1, . . . , I, (17)

where MIis a suﬃciently large constant.

The same can be done for the complementarity condition for constraint (10), introducing binary

variables vo∈ {0,1},o= 1, . . . , O, big-M constant MO, and constraints:

γo

O≤MOvo,−F y0

o+

K

X

k=0

βkyk

o≤MO(1 −vo), o = 1, . . . , O. (18)

The complementarity condition for constraint (12) would be the disjunction βk= 0 or µk= 0. Using the

stationarity condition (15) and again the big-M method with binary variables wk∈ {0,1},k= 0, . . . , K,

and big-M constant Mf, one obtains the constraints:

βk≤Mfwk,γ⊤

Ixk−γ⊤

Oyk≤Mf(1 −wk), k = 0, . . . , K. (19)

Finally, for constraint (11) the complementarity condition yields F= 0 or δ= 0. Remember that

F= 1/E, 0 ≤E≤1, thus Fcannot be zero by deﬁnition and we must impose δ= 0. Using stationarity

condition (14), this yields:

γ⊤

Oy0= 1.(20)

We now reﬂect on the meaning of these constraints. Notice that constraints (17) and (18) model

the slacks of the inputs and outputs respectively, while constraint (19) models the ﬁrms that deﬁne the

frontier, i.e., the ﬁrms with which DMU 0 is to be compared. If binary variable ui= 1, then there is no

slack in input i, i.e., ˆxi=PK

k=1 βkxk

i, whereas if ui= 0 that means there is. The same happens with

13

binary variable vo, namely, it indicates whether there is a slack in output o. On the other hand, when

wk= 1, then the equality of the dual constraint will hold γ⊤

Ixk=γ⊤

Oyk, i.e., ﬁrm kis fully eﬃcient and

it is used to deﬁne the eﬃciency of the counterfactual instance. If wk= 0 then βk= 0, and ﬁrm kis not

being used to deﬁne the eﬃciency of the counterfactual instance.

Notice that µkis only present in (15), thus it is free. In addition, we know that δ= 0. Therefore, we

can transform the stationarity conditions (14) and (15) to

γ⊤

Oy0= 1 (21)

γ⊤

Ixk−γ⊤

Oyk≥0k= 1, . . . , K (22)

γI,γO≥0,(23)

that are exactly the constraints one can ﬁnd in the dual DEA model for the Farrell output eﬃciency.

The new reformulation of the counterfactual explanation problem in DEA is as follows:

min

ˆx,F,β,γI,γO,u,v,wC(x0,ˆx)

s.t. F≤F∗

ˆx∈RI

+

u,v,w∈ {0,1}

(9) −(12) primal

(21) −(23) dual

(17) −(18) slacks

(19) frontier.

So far, we have not been very speciﬁc about the objective function C(x0,ˆ

x). Diﬀerent functional

forms can be introduced, and this may require the introduction of further variables to implement these.

In the numerical examples above, we approximated the ﬁrm’s cost-of-change using diﬀerent combina-

tions of the ℓ0norm, the ℓ1norm, and the squared ℓ2norm. They are widely used in machine learning

when close counterfactuals are sought in attempt to understand how to getter a more attractive outcome

(Carrizosa et al., 2023). The ℓ0“norm”, which strictly speaking is not a norm in the usual mathematical

sense, counts the number of dimensions that has to be changed. The ℓ1norm is the absolute value of

the deviations. Lastly, ℓ2

2is the Euclidean norm, that squares the deviations.

14

As a starting point, we therefore propose the following objective function:

C(x0,ˆ

x) = ν0∥x0−ˆ

x∥0+ν1∥x0−ˆ

x∥1+ν2∥x0−ˆ

x∥2

2,(24)

where ν0, ν1, ν2≥0. Taking into account that there may be speciﬁc product input prices and output

prices or that inputs may have varying degrees of diﬃculty to be changed, one can consider giving dif-

ferent weights to the deviations in each of the inputs.

In order to have a smooth expression of objective function (24), additional decision variables and

constraints have to be added to the counterfactual explanation problem in DEA. To linearize the ℓ0

norm, binary decision variables ξiare introduced. For input i,ξi= 1 models x0

i= ˆxi,i= 1, . . . , I . Using

the big-M method the following constraints are added to our formulation:

−Mzeroξi≤x0

i−ˆxi≤Mzeroξi, i = 1, . . . , I (25)

ξi∈ {0,1}, i = 1, . . . , I, (26)

where Mzero is a suﬃciently large constant.

For the ℓ1norm we introduce continuous decision variables ηi≥0, i= 1, . . . , I, to measure the abso-

lute values of the deviations, ηi=

x0

i−ˆxi

, which is naturally implemented by the following constraints:

ηi≥x0

i−ˆxi, i = 1, . . . , I (27)

−ηi≤x0

i−ˆxi, i = 1, . . . , I (28)

ηi≥0, i = 1, . . . , I. (29)

Thus, the counterfactual explanation problem in DEA with cost function Cin (24), hereafter (CEDEA),

reads as follows:

min

ˆx,F,β,γI,γO,u,v,w,η,ξν0

I

X

i=1

ξi+ν1

I

X

i=1

ηi+ν2

I

X

i=1

η2

i(CEDEA)

s.t. F≤F∗

ˆx∈RI

+

u,v,w∈ {0,1}

(9) −(12),(17) −(19),(21) −(23),

(25) −(26),(27) −(29).

15

Notice that in Problem (CEDEA) we have assumed X(x0) = RI

+as the feasible space for ˆx. Other

relevant constraints for the counterfactual inputs could easily be added, e.g., bounds or relative bounds

on the inputs, or inputs that cannot be changed in the short run, say capital expenses, or that represent

environmental conditions beyond the control of the DMU.

In the case where only the ℓ0and ℓ1norms are considered, i.e., ν2= 0, the objective function as well

as the constraints are linear, while we have both binary and continuous decision variables. Therefore,

Problem (CEDEA) can be solved using an Mixed Integer Linear Programming (MILP) solver. Otherwise,

when ν2= 0, Problem (CEDEA) is a Mixed Integer Convex Quadratic model with linear constraints,

which can be solved with standard optimization packages. When all three norms are used, Problem

(CEDEA) has 3I+K+2+Ocontinuous variables and 2I+O+K+ 1 binary decision variables. It

has 7I+ 3O+ 3K+ 5 constraints, plus the non-negativity and binary nature of the variables. The

computational experiments show that this problem can be solved eﬃciently for our real-world dataset.

We can think of the objective function Cin diﬀerent ways.

One possibility is to see it as an instrument to explore the production possibilities. The use of a

combinations of the ℓ0,ℓ1and ℓ2norms seems natural here. Possible extensions could involve other ℓp

norms, ∥x0−ˆ

x∥p:= PI

i=1

x0

i−ˆxi

p1/p. For all p∈[1,∞), ℓpis convex. This makes the use of ℓp

norms convenient in generalizations of Problem (CEDEA). Of course, arbitrary ℓpnorms may lead to

more complicated implementations in existing softwares since the objective function may no longer be

quadratic.

Closely related to the instrumental view of the objective function is the idea of approximations. At

least as a reasonable initial approximation of more complicated functions, many objective functions C

can be approximated by the form in (24).

To end, one can link the form of Ccloser to economic theory. In the economic literature there have

been many studies on factor adjustments costs. It is commonly believed that ﬁrms change their demand

for inputs only gradually and with some delay, cf. e.g. Hamermesh and Pfann (1999). For labor inputs,

the factor adjustment costs include disruptions to production occurring when changing employment

causes workers’ assignments to be rearranged. Laying oﬀ or hiring new workers is also costly. There

are search costs (advertising, screening, and processing new employees); the cost of training (including

disruptions to production as previously trained workers’ time is devoted to on-the-job instruction of new

workers); severance pay (mandated and otherwise); and the overhead cost of maintaining that part of

the personnel function dealing with recruitment and worker outﬂows. Following again Hamermesh and

Pfann (1999), the literature on both labor and capital goods adjustments has overwhelmingly relied on

one form of C, namely that of symmetric convex adjustment costs much like we use in (24). Indeed,

in the case of only one production factor, the most widely used function form is simply the quadratic

16

one. Hall Hall (2004) and several others have tried to estimate the costs of adjusting labor and capital

inputs. Using a Cobb-Douglas production function, and absent adjustment costs and absent changes

in the ratios of factor prices, an increase in demand or in another determinant of industry equilibrium

would cause factor inputs to change in the same proportion as outputs. Adjustment costs are introduced

as reductions in outputs and are assumed to depend on the squared growth rates in labor and capital

inputs - the larger the percentage change, the larger the adjustment costs. Another economic approach

to the cost-of-change modelling is to think of habits. In ﬁrms - as in the private life - habits are useful.

In the performance of many tasks, including complicated ones, it is easiest to go into automatic mode

and let a behavior unfold. When an eﬃciency requirement is introduced, habits may need to change

and this is costly. The relevance and strength of habit formation has also been studied empirically using

panel data, cf. e.g. Dynan (2000) and the references herein. Habit formation should ideally be considered

in a dynamic framework. To keep it simple, we might consider two periods - the past, where x0was

used and the present, where ˆ

xis consumed. The utility in period two will then typically depend on the

diﬀerence or ratio of present to past consumption, ˆ

x−x0or, in the unidimensional case, ˆ

x/x0. Examples

of functional forms one can use are provided in for example Fuhrer (2000).

5 Extensions

In this section we discuss two extensions that can be carried out using the previous model as a basis.

First, we formulate the model for other technologies, speciﬁcally for the DEA with varying return to

scale (VRS). Second, we present the formulation for the case where the outputs instead of the inputs are

changed in order to generate the counterfactual explanation. In both extensions, we will consider again

a combination of the ℓ0,ℓ1and ℓ2norms as in objective function (24).

5.1 Changing the returns to scale

In Section 4, we have considered the DEA model with constant return to scale (CRS), where the only

requirement on the values of λis that they are positive, i.e., λ∈RK+1

+, but we could consider other

technologies. In that case, to be able to transform our bilevel optimization problem to a single-level

one, we should take into account that for each new constraint derived from the conditions on λ, a

new dual variable has to be introduced. We will consider the varying return to scale (VRS) model as

it is one of the most preferred one by ﬁrms (Bogetoft, 2012), but extensions to other models are analogous.

17

Considering the input-case, with the same transformation as before, we have the following problem:

min

ˆx,E C(x0,ˆx)

s.t. ˆx∈RI

+

E≥E∗

E∈arg min

¯

E,λ0,...,λK

{¯

E:¯

Eˆ

x≥

K

X

k=0

λkxk,y0≤

K

X

k=0

λkyk,

¯

E≥0,λ∈RK+1

+,

K

X

k=0

λk= 1 }.

We can transform it to:

min

ˆx,E C(x0,ˆx)

s.t. ˆx∈RI

+

F≤F∗

F∈arg min

¯

F ,λ0,...,λK

{¯

F:ˆx≥

K

X

k=0

βkxk,ˆ

Fy0≤

K

X

k=0

βkyk,

¯

F≥0,β∈RK+1

+,

K

X

k=0

βk=F}.

Notice that the only diﬀerence is that we have a new constraint associated with the technology,

namely, PK

k=0 βk=F. Let κ≥0 be the new dual variable associated with this constraint. Then, the

following changes are made in the stationarity constraints (21) and (22):

γ⊤

Oy0+κ= 1 (30)

γ⊤

Ixk−γ⊤

Oyk−κ≥0k= 0, . . . , K. (31)

The single-level formulation for the counterfactual problem for the VRS DEA model is as follows:

min

ˆx,F,β,γI,γO,u,v,w,κ,η,ξν0

I

X

i=1

ξi+ν1

I

X

i=1

ηi+ν2

I

X

i=1

(x0

i−ˆxi)2(CEVDEA)

s.t. F≤F∗

K

X

k=0

βk=F

ˆx∈RI

+

18

κ≥0

u,v,w∈ {0,1}

(9) −(12),(17) −(19)

(23),(25) −(31).

5.2 Changing the outputs

We have calculated the counterfactual instance of a ﬁrm as the mininum cost changes in the inputs

in order to have a better eﬃciency. But, in the same vein, we could consider instead changes in the

outputs, leaving the same inputs. Again, suppose ﬁrm 0 is not fully eﬃcient, having E0<1. Now,

we are interested in calculating the minimum changes in the outputs y0that make it to have a higher

eﬃciency E∗> E0. Let ˆ

ybe the new outputs of ﬁrm 0 that make it to be at least E∗eﬃcient. We have,

then, the following bilevel optimization problem:

min

ˆy,E C(y0,ˆy)

s.t. ˆy∈RO

+

E≥E∗

E∈arg min

¯

E,λ0,...,λK

{¯

E:¯

Ex0≥

K

X

k=0

λkxk,ˆ

y≤

K

X

k=0

λkyk,

¯

E≥0,λ∈RK+1

+}.

Following similar steps as in previous section, the single-level formulation for the counterfactual

problem in DEA in the output case is as follows:

min

ˆy,E ,λ,γI,γO,u,v,w,η,ξν0

O

X

o=1

ξo+ν1

O

X

o=1

ηo+ν2

O

X

o=1

(y0

o−ˆyo)2(CEODEA)

s.t. ˆy∈RO

+

E≥E∗

Ex0≥

K

X

k=0

λkxk

ˆy≤

K

X

k=0

λkyk

γ⊤

Ix0= 1

19

γ⊤

Oyk−γ⊤

Ixk≤0k= 0, . . . , K

γi

I≤MIuii= 1, . . . , I

Ex0

i−

K

X

k=0

λkxk

i≤MI(1 −ui)i= 1, . . . , I

γo

O≤MOvoo= 1, . . . , O

−ˆyo+

K

X

k=0

λkyk

o≤MO(1 −vo)o= 1, . . . , O

λk≤Mfwkk= 0, . . . , K

γ⊤

Oyk−γ⊤

Ixk≤Mf(1 −wk)k= 0, . . . , K

−Mzeroξo≤y0

o−ˆyoo= 1, . . . , O

y0

o−ˆyo≤Mzeroξoo= 1, . . . , O

ηo≥y0

o−ˆyoo= 1, . . . , O

−ηo≤y0

o−ˆyoo= 1, . . . , O

E, λ,γI,γO,η≥0

u,v,w,ξ∈ {0,1}.

As in the input model, depending on the cost function, we either obtain an MILP model or a Mixed

Integer Convex Quadratic model with linear constraints. This model could be formulated analogously

for the VRS case.

6 A banking application

In this section, we illustrate our methodology using real-world data on bank branches, Schaﬀnit et al.

(1997), by constructing a collection of counterfactual explanations for each of the ineﬃcient ﬁrms that

can help them learn about the DEA benchmarking model and how they can improve their eﬃciency.

The data is described in more detail in Section 6.1, where we consider a model of bank branch

production with I= 5 inputs and O= 3 outputs, and thus a production possibility set in R8

+, spanned

by K+1 = 267 ﬁrms. In Section 6.2, we will focus on changing the inputs, and therefore the counterfactual

explanations will be obtained with Problem (CEDEA). We will discuss the results obtained with diﬀerent

cost of change functions C, reﬂecting the eﬀort an ineﬃcient ﬁrm will need to spend to change to its

counterfactual instance, and diﬀerent desired levels of eﬃciency E∗. The Farrell projection discussed

in Section 3 is added for reference. The counterfactual analysis sheds light on the nature of the DEA

benchmarking model, which is otherwise hard to comprehend because of the many ﬁrms and inputs and

20

outputs involved in the construction of the technology.

All optimization models have been implemented using Python 3.8 and as solver Gurobi 9.0 (Gurobi Op-

timization, 2021). We have solved Problem (CEDEA) with MI=M0=Mf= 1000 and Mzero = 1. The

validity of Mzero = 1 will be shown below. Our numerical experiments have been conducted on a PC,

with an Intel R CoreTM i7-1065G7 CPU @ 1.30GHz 1.50 GHz processor and 16 gigabytes RAM. The

operating system is 64 bits.

6.1 The data

The data consist of ﬁve staﬀ categories and three diﬀerent types of outputs in the Ontario branches of

a large Canadian bank. The inputs are measured as full-time equivalents (FTEs), and the outputs are

the average monthly counts of the diﬀerent transactions and maintenance activities. Observations with

input values equal to 0 are removed, leaving us with an actual dataset with 267 branches. Summary

statistics are provided in Table 4.

Mean Min Max Std. dev.

INPUTS

Teller 5.83 0.49 39.74 3.80

Typing 1.05 0.03 22.92 1.84

Accounting & ledgers 4.69 0.80 65.93 5.13

Supervision 2.05 0.43 38.29 2.66

Credit 4.40 0.35 55.73 6.19

OUTPUTS

Term accounts 2788 336 22910 2222

Personal loan accounts 117 0 1192 251

Commercial loan accounts 858 104 8689 784

Table 4: Descriptive statistics of the Canadian bank branches dataset in Schaﬀnit et al. (1997)

After calculating all the eﬃciencies through Problem (DEA), one has that 236 ﬁrms of the 267 ones

are ineﬃcient. Out of those, 219 ﬁrms have an eﬃciency below 90%, 186 below 80%, 144 below 70%, 89

below 60% and 49 below 50%.

6.2 Counterfactual analysis of bank branches

To examine the ineﬃcient ﬁrms, we will determine counterfactual explanations for these. Prior to that,

we have divided each input by its maximum value across all ﬁrms. We notice that this has no impact on

the solution since DEA models are invariant to linear transformations of inputs and outputs. Also, this

makes valid choosing Mzero = 1, since the values of all inputs are upper bounded by 1.

We will use three diﬀerent cost functions, by changing the values of the parameters ν0, ν1, ν2in (24),

as well as two diﬀerent values of the desired eﬃciency of the counterfactual instance, namely E∗= 1 and

0.8. In the ﬁrst implementation of the cost function, which we denote ℓ0+ (ℓ2), we use ν0= 1, ν2= 10−3

21

Figure 4: Counterfactual Explanations for ﬁrm 238 with Problem (CEDEA) and desired eﬃciency E∗=

1.

and ν1= 0, i.e., we will seek to minimize the ℓ0norm and only introduce a little bit of the squared

Euclidean norm to ensure a unique solution of Problem (CEDEA). In the second implementation, which

we call ℓ0+ℓ2, we take ν0= 1, ν1= 0 and ν2= 105, such that the squared Euclidean norm has a

higher weight than in cost function ℓ0+ (ℓ2). Finally, we denote by ℓ2the cost function that focuses

on the minimization of the squared Euclidean norm only, i.e., ν0=ν1= 0 and ν2= 1. The summary

of all the cost functions used can be seen in Table 5. Calculations were also done for the ℓ1norm, i.e.,

ν0=ν2= 0 and ν1= 1, but as the solutions found were similar to those for cost function ℓ0+ℓ2,

for the sake of clarity of presentation, they are omitted. We start the discussion of the counterfactual

explanations obtained with E∗= 1, as summarized in Figures 4-5 and Tables 6-7. We then move on to

a less demanding desired eﬃciency, namely, E∗= 0.8. These results are summarized in Figures 6-7 and

Tables 8-9.

Cost function ν0ν1ν2

ℓ0+ (ℓ2) 1 0 10−3

ℓ0+ℓ21 0 105

ℓ20 0 1

Table 5: Value of the parameters ν0, ν1and ν2in (24) for the diﬀerent cost functions used

Let us ﬁrst visualize the counterfactual explanations for a speciﬁc ﬁrm. Consider, for instance,

22

ﬁrm 238, which has an original eﬃciency of E0= 0.72. We can visualize the diﬀerent counterfactual

explanations generated by the diﬀerent cost functions using a spider chart, see Figure 4. In addition

to the counterfactual explanations obtained with Problem (CEDEA), we also illustrate the so-called

Farrell projection. In the spider chart, each axis represents an input and the original values of the ﬁrm

corresponds to the outer circle. Figure 4 shows the diﬀerent changes needed depending on the cost

function used. With the ℓ0+ (ℓ2), where the focus is to mainly penalize the number of inputs changed,

we see that only the typing personnel has to be changed, leaving the rest of the inputs unchanged.

Nevertheless, because only one input is changed, it has to be decreased by 60% from the original value.

The Farrell solution decreases the typing personnel by 28% of its value, but to compensate, it changes

the remaining four inputs proportionally. When the ℓ0+ℓ2cost function is used, the typing personnel

keeps on needing to be changed, but the change is smaller, this time by 51% of its value. The supervision

personnel needs also to be decreased by 16% of its value, while the rest of the inputs remain untouched.

Increasing the weight on the Euclidean norm in the cost function gives us the combination of the two

inputs that are crucial to change in order to gain eﬃciency, as well as the exact amount that they need

to be reduced. Finally, using only the Euclidean norm, the typing, supervision and credit personnel are

the inputs to be changed, the typing input is reduced slightly less than with the ℓ0+ℓ2in exchange of

reducing just by 1% the credit input. Notice that the teller and accounting and ledgers personnel are

never changed in the counterfactual explanations generated by our methodology, which leads us to think

that these inputs are not the ones leading ﬁrm 238 to have its original low eﬃciency.

The analysis above is for a single ﬁrm. We now present some statistics about the counterfactual

explanations obtained for all the ineﬃcient ﬁrms. Recall that these are 236 ﬁrms, and that Problem

(CEDEA) has be solved for each of them. In Table 6 we show for each cost function, how often an input

has to be changed. For instance, the value 0.09 in the last row of the Teller column shows that in 9% of

all ﬁrms, we have to change the number of tellers when the aim is to ﬁnd a counterfactual instance using

the Euclidean norm. When more weight is given to the ℓ0norm, few inputs are changed. Indeed, for the

Teller column, with the ℓ0+ℓ2, 3% of all ﬁrms change it, instead of 9%, and this number decreases to 1%

when the ℓ0+ (ℓ2) is used. The same pattern can be observed in all inputs, particularly notable in the

Acc. and Ledgers personnel, that goes from changing in more than half of the banks with the Euclidean

norm, to changing in only 14% of the ﬁrms. The last column of Table 6, Mean ℓ0(x−ˆx), shows how

many inputs are changed on average when we use the diﬀerent cost functions. With the ℓ0+ (ℓ2) only

one input has to be decreased, thus with this cost function one detects the crucial input to be modiﬁed

to be fully eﬃcient, leaving the rest ﬁxed. In general, the results show that, for the ineﬃcient ﬁrms,

the most common changes leading to full eﬃciency is to reduce the number of typists and the number

of credit oﬃcers. The excess of typists is likely related to the institutional setting. Bank branches need

23

the so-called typists for judicial regulations, but they only need the services to a limited degree, see also

Table 4. In such cases, it may be diﬃcult to match the full time equivalents employed precisely to the

need. The excess of Credit oﬃcers is more surprising since, in particular, they are one of the best paid

personnel groups.

In Table 7, we look at the size of the changes and not just if a change has to take place or not.

The interpretation of the value 0.43 under the ﬁrst row and the Teller column suggests that when

the teller numbers have to be changed, they are reduced by 43% from the initial value, on average.

Since several inputs may have to change simultaneously, deﬁning the vector of the relative changes

r= ((xi−ˆxi)/xi)I

i=1, the last column shows the mean value of the Euclidean norm of this vector. We

see, for example, that in the relatively few cases the teller personnel has to change under ℓ0+ (ℓ2), the

changes are relatively large. We see again the diﬃculties the bank branches apparently have hiring the

right amount of typists. We saw in Table 6 that they often have to change and we see now that the

changes are non-trivial with about a half excess full time equivalents.

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ0(x−ˆx)

ℓ0+ (ℓ2) 0.01 0.38 0.14 0.13 0.34 1.00

ℓ0+ℓ20.03 0.40 0.17 0.14 0.38 1.13

ℓ20.09 0.45 0.51 0.21 0.47 1.72

Table 6: Average results on how often the inputs (personnels) change when desired eﬃciency is E∗= 1.

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ2(r)

ℓ0+ (ℓ2) 0.43 0.61 0.41 0.43 0.37 0.4743

ℓ0+ℓ20.21 0.58 0.33 0.38 0.35 0.4742

ℓ20.11 0.53 0.14 0.27 0.29 0.4701

Table 7: Average results on how much the inputs (personnels) change when desired eﬃciency is E∗= 1.

In Figure 5, we use a heatmap to illustrate which input factors have to change for the individual ﬁrms

using the three diﬀerent cost functions in Table 6. Rows with no markings represent ﬁrms that were fully

eﬃcient to begin with. We see as we would expect that the more weight we put on the Euclidean norm,

the more densely populated the illustration becomes, i.e., the more inputs have to change simultaneously.

So far we have asked for counterfactual instances that are fully eﬃcient. If we instead only ask for the

counterfactual instances to be at least 80% eﬃcient, only 186 ﬁrms need to be studied. As before, let us

ﬁrst visualize the counterfactual explanations for ﬁrm 238, which had an original eﬃciency of E0= 0.72.

In Figure 6, we can see the diﬀerent changes when imposing E∗= 0.8. We again see that the Farrell

approach reduces all inputs proportionally, speciﬁcally by 9.5% of their values. We see also that under

the ℓ0+ (ℓ2) norm, only Credit personnel has to be reduced, by 15%. Under the Euclidean norm, Teller

and Acc. and Ledgers personnel are not aﬀected while Typing, Supervision and Credit oﬃcers have to

be saved, by 4%, 13% and 7%, respectively. Notice that only the change in Supervision is higher in this

24

(a) C=ℓ0+ (ℓ2) (b) C=ℓ0+ℓ2(c) C=ℓ2

Figure 5: The inputs that change when we impose a desired eﬃciency of E∗= 1

case than in the Farrell solution, while the decrease in the remain four inputs is signiﬁcantly smaller

for the Euclidean norm. Recall that in Figure 4 the counterfactual explanations for the same ﬁrm 238

have been calculated imposing E∗= 1. Altering the desired eﬃciency level from E∗= 0.8 to E∗= 1

leads to rather dramatic changes in the counterfactual explanations. For the ℓ0+ (ℓ2) cost function, for

a desired eﬃciency of E∗= 0.8, we needed to decrease the Credit personnel dramatically whereas for a

desired eﬃciency of E∗= 1, it is suggested to leave unchanged the Credit personnel and to change the

Typing personnel instead. On the other hand, what remains the same is the fact that Teller and Acc.

and Ledgers oﬃcers are never aﬀected in the counterfactual explanations with the three cost functions.

After the analysis for a single ﬁrm, now we present statistics about the counterfactual explanations

obtained for all 168 ﬁrms that had an original eﬃciency below 80%. The frequency of changes and the

relative sizes of the changes are shown in Tables 8 and 9. We see, as we would expect, that the amount

of changes necessary is reduced. On the other hand, the inputs to be changed are not vastly diﬀerent.

The tendency to change in particular credit oﬃcers is slightly larger now.

In Figure 7, we show the input factors that need to change for the individual ﬁrms using the three

diﬀerent cost functions in Table 8 for the case now with E∗= 0.8. As expected, we can see now an

increasing number of rows with no markings compared to Figure 5, belonging to the ﬁrms that had

25

Figure 6: Counterfactual Explanations for DMU 238 with Problem (CEDEA) and desired eﬃciency

E∗= 0.8

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ0(x−ˆx)

ℓ0+ (ℓ2) 0.01 0.30 0.18 0.08 0.44 1.00

ℓ0+ℓ20.03 0.32 0.19 0.09 0.47 1.11

ℓ20.13 0.43 0.51 0.18 0.63 1.88

Table 8: Average results on how often the inputs (personnels) change when desired eﬃciency is E∗= 0.8

already an eﬃciency of 0.8.

7 Conclusions

In this paper, we have proposed a collection of optimization models to calculate counterfactual expla-

nations in DEA models, i.e., the least costly changes in the inputs or outputs of a ﬁrm that leads to

a pre-speciﬁed higher eﬃciency level. With our methodology, we are able to include diﬀerent ways to

measure the proximity between a ﬁrm and its counterfactual, namely, using the ℓ0,ℓ1, and ℓ2norms or

a combination of them.

Calculating counterfactual explanations involves ﬁnding “close” alternatives in the complement of

a convex set. We have reformulated this bilevel optimization problem as either an MILP or a Mixed

Integer Convex Quadratic Problem with linear constraints. In our numerical section, we can see that for

our banking application, we are able to solve this model to optimality.

26

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ2(r)

ℓ0+ (ℓ2) 0.28 0.56 0.28 0.41 0.27 0,3708

ℓ0+ℓ20.12 0.52 0.25 0.39 0.26 0.3707

ℓ20.05 0.41 0.11 0.25 0.21 0.3702

Table 9: Average results on how much the inputs (personnels) change when desired eﬃciency is E∗= 0.8

(a) C=ℓ0+ (ℓ2) (b) C=ℓ0+ℓ2(c) C=ℓ2

Figure 7: The inputs that change when we impose a desired eﬃciency of E∗= 0.8

DEA models can capture very complex relationships between multiple inputs and outputs. This

allows more substantial evaluations and also oﬀers a framework that can support many operational,

tactical and strategic planning eﬀorts. However, there is also a risk that such a model is seen as a pure

black box which in turn can lead to mistrust and some degree of model or algorithm aversion. By looking

at counterfactuals, a ﬁrm can get a better understanding of the production space and is more likely to

trust in the modelling.

Counterfactual explanations in DEA can also help a ﬁrm choose which changes to implement. It is

not always enough to simply think of a strategy and which factors can easily be changed, say a direction

in input space. It is also important what the technology looks like and therefore how large such changes

need to be to get a desired improvement in eﬃciency. In this way, the analysis of close counterfactual

explanations can help endogenize the choice of both desirable and eﬀective directions to move in. By

varying the parameters of the cost function, the ﬁrm can even get a menu of counterfactuals, from which

27

it can choose, having thus more ﬂexibility and leading the evaluated ﬁrm to gain more trust in the

underlying model.

Note also that by calculating the counterfactual explanations for all ﬁrms involved, as we did in our

banking application, one can determine which combinations of inputs and outputs that most commonly

shall be changed to improve eﬃciency. This is interesting from an overall system point of view. Society

at large - or for example a regulator tasked primarily with incentivizing natural monopolies to improve

eﬃciency - may not solely be interested that everyone becomes eﬃcient. It may as well be important

how the eﬃciency is improved, e.g. by reducing the use of imported or domestic resources or by laying

oﬀ some particular types of labor and not other types.

There are several interesting extensions that can be explored in future research. Here we just mention

two. One possibility is to use alternative eﬃciency measures to constrain the search for counterfactual

instances. We have here used Farrell eﬃciency, which is by far the most common eﬃciency measure

in DEA studies, but one might consider other alternative measures, e.g. additive ones like the excess

measure. Another relevant extension could be to make the counterfactuals less individualized. One

could for example look for the common features that counterfactual explanations should change across

all individual ﬁrms and that lead to the minimum total cost.

Acknowledgements

This research has been ﬁnanced in part by research projects EC H2020 MSCA RISE NeEDS (Grant agree-

ment ID: 822214); COST Action CA19130 - “Fintech and Artiﬁcial Intelligence in Finance - Towards a

transparent ﬁnancial industry” (FinAI); FQM-329, P18-FR-2369 and US-1381178 (Junta de Andaluc´ıa),

and PID2019-110886RB-I00 (Ministerio de Ciencia, Innovaci´on y Universidades, Spain). This support

is gratefully acknowledged. Similarly, we appreciate the ﬁnancial support from Independent Research

Fund Denmark, Grant 9038-00042A.

28

References

Agrell, P. and Bogetoft, P. (2017). Theory, techniques and applications of regulatory benchmarking and

productivity analysis. In Oxford Handbook of Productivity Analysis, pages 523–555. Oxford University

Press: Oxford.

Antle, R. and Bogetoft, P. (2019). Mix stickiness under asymmetric cost information. Management

Science, 65(6):2787–2812.

Ben´ıtez-Pe˜na, S., Bogetoft, P., and Romero Morales, D. (2020). Feature selection in data envelopment

analysis: A mathematical optimization approach. Omega, 96:102068.

Bogetoft, P. (2012). Performance Benchmarking - Measuring and Managing Performance. Springer,

New York.

Bogetoft, P. and Hougaard, J. (1999). Eﬃciency evaluation based on potential (non-proportional) im-

provements. Journal of Productivity Analysis, 12:233–247.

Bogetoft, P. and Otto, L. (2011). Benchmarking with DEA, SFA, and R. Springer, New York.

Carrizosa, E., Ram´ırez Ayerbe, J., and Romero Morales, D. (2021). Generating collective coun-

terfactual explanations in score-based classiﬁcation via mathematical optimization. Techni-

cal report, IMUS, Sevilla, Spain, https://www.researchgate.net/publication/353073138_

Generating_Collective_Counterfactual_Explanations_in_Score-Based_Classification_via_

Mathematical_Optimization.

Carrizosa, E., Ram´ırez Ayerbe, J., and Romero Morales, D. (2023). Mathematical opti-

mization modelling for group counterfactual explanations. Technical report, IMUS, Sevilla,

Spain, https://www.researchgate.net/publication/368958766_Mathematical_Optimization_

Modelling_for_Group_Counterfactual_Explanations.

Charnes, A., Cooper, W. W., Lewin, A. Y., and Seiford, L. M. (1995). Data Envelopment Analysis:

Theory, Methodology and Applications. Kluwer Academic Publishers, Boston, USA.

Charnes, A., Cooper, W. W., and Rhodes, E. (1978). Measuring the eﬃciency of decision making units.

European Journal of Operational Research, 2:429–444.

Charnes, A., Cooper, W. W., and Rhodes, E. (1979). Short Communication: Measuring the Eﬃciency

of Decision Making Units. European Journal of Operational Research, 3:339.

29

Cherchye, L., Rock, B. D., Dierynck, B., Roodhooft, F., and Sabbe, J. (2013). Opening the “black box”

of eﬃciency measurement: Input allocation in multioutput settings. Operations Research, 61(5):1148–

1165.

Du, M., Liu, N., and Hu, X. (2019). Techniques for interpretable machine learning. Communications of

the ACM, 63(1):68–77.

Dynan, K. (2000). Habit formation in consumer preferences: Evidence from panel data. American

Economic Review, 90(3):391–406.

Esteve, M., Aparicio, J., Rodriguez-Sala, J., and Zhu, J. (2023). Random forests and the measurement

of super-eﬃciency in the context of free disposal hull. European Journal of Operational Research,

304(2):729–744.

European Commission (2020). White Paper on Artiﬁcial Intelligence : a Euro-

pean approach to excellence and trust.https://ec.europa.eu/info/publications/

white-paper-artificial-intelligence- european-approach-excellence-and-trust_en.

F¨are, R. and Grosskopf, S. (2000). Network DEA. Socio-Economic Planning Sciences, 34(1):35–49.

F¨are, R., Pasurkac, C., and M.Vardanyan (2017). On endogenizing direction vectors in parametric

directional distance function-based models. European Journal of Operational Research, 262:361–369.

Fischetti, M. and Jo, J. (2018). Deep neural networks and mixed integer linear optimization. Constraints,

23(3):296–309.

Fuhrer, J. C. (2000). Habit formation in consumption and its implications for monetary-policy models.

The American Economic Review, 90(4):367–390.

Goodman, B. and Flaxman, S. (2017). European Union regulations on algorithmic decision-making and

a “right to explanation”. AI Magazine, 38(3):50–57.

Guidotti, R. (2022). Counterfactual explanations and how to ﬁnd them: literature review and bench-

marking. Forthcoming in Data Mining and Knowledge Discovery.

Gurobi Optimization, L. (2021). Gurobi optimizer reference manual.

Hall, R. (2004). Measuring factor adjustment costs. The Quarterly Journal of Economics, 119(3):899–

927.

Hamermesh, D. S. and Pfann, G. A. (1999). Adjustment costs in factor demand. Journal of Economic

Literature, 34(3):1264–1292.

30

Haney, A. and Pollitt, M. (2009). Eﬃciency analysis of energy networks: An international survey of

regulators. Energy Policy, 37(12):5814–5830.

Kao, C. (2009). Eﬃciency decomposition in network data envelopment analysis: A relational model.

European Journal of Operational Research, 192:949–962.

Karimi, A.-H., Barthe, G., Sch¨olkopf, B., and Valera, I. (2022). A survey of algorithmic recourse:

contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5):1–29.

Lundberg, S., Erion, G., Chen, H., DeGrave, A., Prutkin, J., Nair, B., Katz, R., Himmelfarb, J., Bansal,

N., and Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for

trees. Nature Machine Intelligence, 2(1):2522–5839.

Lundberg, S. and Lee, S.-I. (2017). A uniﬁed approach to interpreting model predictions. In Advances

in Neural Information Processing Systems, pages 4765–4774.

Martens, D. and Provost, F. (2014). Explaining data-driven document classiﬁcations. MIS Quarterly,

38(1):73–99.

Molnar, C., Casalicchio, G., and Bischl, B. (2020). Interpretable machine learning–a brief history,

state-of-the-art and challenges. In Joint European Conference on Machine Learning and Knowledge

Discovery in Databases, pages 417–431. Springer.

Monge, J. F. and Ruiz, J. L. (2023). Setting closer targets based on non-dominated convex combina-

tions of pareto-eﬃcient units: A bi-level linear programming approach in data envelopment analysis.

Forthcoming in European Journal of Operational Research.

Parmentier, A. and Vidal, T. (2021). Optimal counterfactual explanations in tree ensembles. In Inter-

national Conference on Machine Learning, pages 8422–8431. PMLR.

Parmeter, C. and Zelenyuk, V. (2019). Combining the virtues of stochastic frontier and data envelopment

analysis. Operations Research, 67(6):1628–1658.

Petersen, N. (2018). Directional Distance Functions in DEA with Optimal Endogenous Directions.

Operations Research, 66(4):1068–1085.

Peyrache, A., Rose, C., and Sicilia, G. (2020). Variable selection in data envelopment analysis. European

Journal of Operational Research, 282(2):644–659.

Rigby, D. (2015). Management tools 2015 - an executive’s guide. Bain & Company.

Rigby, D. and Bilodeau, B. (2015). Management tools and trends 2015. Bain & Company.

31

Rostami, S., Neri, F., and Epitropakis, M. (2017). Progressive preference articulation for decision making

in multi-objective optimisation problems. Integrated Computer-Aided Engineering, 24(4):315–335.

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., and Zhong, C. (2022). Interpretable machine

learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16:1–85.

Schaﬀnit, C., Rosen, D., and Paradi, J. (1997). Best practice analysis of bank branches: An application

of DEA in a large Canadian bank. European Journal of Operational Research, 98(2):269–289.

Thach, P. (1988). The design centering problem as a DC programming problem. Mathematical Program-

ming, 41(1):229–248.

Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the

black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31:841–887.

Zhu, J. (2016). Data Envelopment Analysis - A Handbook on Models and Methods. Springer New York.

32