Content uploaded by Jasone Ramírez-Ayerbe

Author content

All content in this area was uploaded by Jasone Ramírez-Ayerbe on Nov 30, 2023

Content may be subject to copyright.

Counterfactual Analysis and Target Setting in Benchmarking

Peter Bogetoft∗1, Jasone Ram´ırez-Ayerbe†2, and Dolores Romero Morales‡1

1Department of Economics, Copenhagen Business School, Frederiksberg, Denmark

2Instituto de Matem´aticas de la Universidad de Sevilla, Seville, Spain

Abstract

Data Envelopment Analysis (DEA) allows us to capture the complex relationship between mul-

tiple inputs and outputs in ﬁrms and organizations. Unfortunately, managers may ﬁnd it hard to

understand a DEA model and this may lead to mistrust in the analyses and to diﬃculties in deriving

actionable information from the model. In this paper, we propose to use the ideas of target setting

in DEA and of counterfactual analysis in Machine Learning to overcome these problems. We deﬁne

DEA counterfactuals or targets as alternative combinations of inputs and outputs that are close to

the original inputs and outputs of the ﬁrm and lead to desired improvements in its performance. We

formulate the problem of ﬁnding counterfactuals as a bilevel optimization model. For a rich class

of cost functions, reﬂecting the eﬀort an ineﬃcient ﬁrm will need to spend to change to its counter-

factual, ﬁnding counterfactual explanations boils down to solving Mixed Integer Convex Quadratic

Problems with linear constraints. We illustrate our approach using both a small numerical example

and a real-world dataset on banking branches.

Keywords— Data Envelopment Analysis; Benchmarking; DEA Targets; Counterfactual Explana-

tions; Bilevel Optimization

1 Introduction

In surveys among business managers, benchmarking is consistently ranked as one of the most popular

management tools (Rigby, 2015; Rigby and Bilodeau, 2015). The core of benchmarking is relative

performance evaluation. The performance of one entity is compared to that of a group of other entities.

The evaluated “entity” can be a ﬁrm, organization, manager, product or process. In the following, it

will be referred to simply as a Decision Making Unit (DMU).

There are many benchmarking approaches and they can serve diﬀerent purposes, such as, facilitat-

ing learning,decision making and incentive design. Some approaches are very simple and rely on the

comparison of a DMU’s Key Performance Indicators (KPIs) to those of a selected peer group of DMUs.

These KPIs are basically partial productivity measures (e.g., labour productivity, yield per hectare, etc.).

This makes KPI based benchmarking easy to understand, but also potentially misleading by ignoring the

role of other inputs and outputs in real DMUs. More advanced benchmarking approaches rely on frontier

models using mathematical programming, e.g., Data Envelopment Analysis (DEA), and Econometrics,

∗Peter Bogetoft: pb.eco@cbs.dk

†Jasone Ram´ırez-Ayerbe: mrayerbe@us.es

‡Dolores Romero Morales: drm.eco@cbs.dk

1

e.g., Stochastic Frontier Analysis (SFA), and they allow us to explicitly model the complex interaction

between the multiple inputs and outputs among best-practice DMUs, cf. e.g. Bogetoft and Otto (2011);

Charnes et al. (1995); Parmeter and Zelenyuk (2019); Zhu (2016).

In this paper we focus on DEA based benchmarking. To construct the best practice performance

frontier and evaluate the eﬃciency of a DMU relative to this frontier, DEA introduces a minimum of

production economic regularities, typically convexity, and uses linear or mixed integer programming to

capture the relationship between multiple inputs and outputs of a DMU. In this sense, and in the eyes of

the modeller, the method is well-deﬁned and several of the properties of the model will be understandable

from the production economic regularities. Still, from the point of view of the evaluated DMUs, the model

will appear very much like a black box. Understanding a multiple input and multiple output structure

is basically diﬃcult. Also, in DEA, there is no explicit formula showing the impact of speciﬁc inputs

on speciﬁc outputs as in SFA or other econometrics based approaches. This has led some researchers to

look for extra information and structure of DEA models, most notably by viewing the black box as a

network of more speciﬁc process, cf. e.g. Cherchye et al. (2013); F¨are and Grosskopf (2000); Kao (2009).

The black box nature of DEA models may lead to some algorithm aversion and mistrust in the model,

and to diﬃculties in deriving actionable information from the model beyond the eﬃciency scores. To

overcome this and to get insights into the functioning of a DEA model, there are several strands of

literature and tools that can be useful. The Multiple Criteria Decision Making (MCDM) literature has

developed several ways in which complicated sets of alternatives can be explored and presented to a

decision maker. Also, in DEA, there is already a considerable literature on ﬁnding targets that a ﬁrm

can investigate in attempts to ﬁnd attractive alternative production plans. Last, but not least, it may

be interesting to look for counterfactual explanations much like they are used in machine learning.

In this paper, we propose the use of counterfactual and target analyses to understand and explain

the eﬃciencies of individual DMUs, to learn about the estimated best practice technology, and to help

answer what-if questions that are relevant in operational, tactical and strategic planning eﬀorts (Bogetoft,

2012). In a DEA context, counterfactual and target analyses can help with learning, decision making

and incentive design. In terms of learning, the DMU may be interested to know what simple changes

in features (inputs and outputs) lead to a higher eﬃciency level. In the application investigated in our

numerical section, this can be, for instance, how many credit oﬃcers or tellers a bank branch should

remove to become fully eﬃcient. This may help the evaluated DMU learn about and gain trust in the

underlying modelling. In terms of decision making, targets and counterfactual explanations may help

guide the decision process by oﬀering the smallest, the most plausible and actionable, and the least

costly changes that lead to a desired boost in performance. It depends on the context how to deﬁne the

least costly, or the most plausible or actionable improvement paths. In some cases it may be easier to

reduce all inputs more or less the same (lawn mowing), while in other cases certain inputs should be

reduced more aggressively than others, cf. Antle and Bogetoft (2019). Referring back to the application

in the numerical section, reducing the use of diﬀerent labor inputs could for example take into account

the power of diﬀerent labor unions and the capacity of management to struggle with multiple employee

groups simultaneously. Lastly, targets and counterfactual explanations may be useful in connection with

incentive provisions. DEA models are routinely used by regulators of natural monopoly networks to

incentivize cost reductions and service improvement, cf. e.g. Haney and Pollitt (2009) and later updates

in Agrell and Bogetoft (2017); Bogetoft (2012). Regulated ﬁrms will naturally look for the easiest way to

accommodate the regulator’s eﬃciency thresholds. Counterfactual explanations may in such cases serve

to guide the optimal strategic responses to the regulator’s requirements.

Unfortunately, it is not an entirely trivial task to properly determine targets and construct counter-

2

factual explanations in a DEA context. We need to ﬁnd alternative solutions that are in some sense

close to the existing input-output combination used by a DMU. This involves ﬁnding “close” alternatives

in the complement of a convex set (Thach, 1988). In this paper, we investigate diﬀerent ways to mea-

sure the closeness between a DMU and its counterfactual DMU, or the cost of moving from an existing

input-output proﬁle to an alternative target. In particular, we suggest to use combinations of ℓ0,ℓ1,

and ℓ2norms. We also consider both changes in input and output features and show how to formulate

the problems in DEA models with diﬀerent returns to scale assumptions. We show how determining

targets and constructing counterfactual explanations leads to a bilevel optimization model, that can be

reformulated as a Mixed Integer Convex Quadratic Problem with linear constraints. We illustrate our

approach on both a small numerical example as well as a large scale real-world dataset involving bank

branches.

The outline of the paper is as follows. In Section 2 we review the relevant literature. In Section 3

we introduce the necessary DEA notation for constructing targets and counterfactual explanations, as

well as a small numerical example. In Section 4 we describe our bilevel optimization formulation and its

reformulation as a Mixed Integer Convex Quadratic Problem with linear constraints. In Section 5 we

illustrate our approach with real-world data on bank branches. We end the paper with conclusions in

Section 6. In the Appendix, we extend the analysis by investigating alternative returns to scale and by

investigating changes in the outputs rather than the inputs.

2 Background and Literature

In this section, we give some background on DEA benchmarking, in particular on directional and interac-

tive benchmarking, on target setting in DEA, and on counterfactual analysis from interpretable machine

learning.

Data Envelopment Analysis, DEA, was ﬁrst introduced in Charnes et al. (1978, 1979) as a tool for

measuring eﬃciency and productivity of decision making units, DMUs. The idea of DEA is to model the

production possibilities of the DMUs and to measure the performance of the individual DMUs relative

to the production possibility frontier. The modelling is based on observed practices that form activities

in a Linear Programming (LP) based activity analysis model.

Most studies use DEA models primarily to measure the relative eﬃciency of the DMUs. The bench-

marking framework, the LP based activity analysis model, does however allow us to explore a series of

other questions. In fact, the benchmarking framework can serve as a learning lab and decision support

tool for managers. In the DEA literature, this perspective has been emphasized by the idea of interactive

benchmarking. Interactive benchmarking and associated easy to use software has been used in a series

of applications and consultancy projects, cf. e.g. Bogetoft (2012). The idea is that a DMU can search

for alternative and attractive production possibilities and hereby learn about the technology, explore

possible changes and trade-oﬀs and look for least cost changes that allow for necessary performance im-

provements, cf. also our discussion of learning, decision making and incentive and regulation applications

in the introduction.

One way to illustrate the idea of interactive benchmarking is as in Figure 1 below. A DMU has used

two inputs to produce two outputs. Based on the data from other DMUs, an estimate of the best practice

technology has been established as illustrated by the piecewise linear input and output isoquants. The

DMU may now be interested in exploring alternative paths towards best practices. One possibility is to

save a lot of input 2 and somewhat less of input 1, i.e., to move in the direction dxillustrated by the

3

arrow in the left panel. If the aim is to become fully eﬃcient, this approach suggests that the DMU

instead of the present (input,output) combination (x,y) should consider the alternative (ˆx,y). A similar

logic could be used on the output side keeping the inputs ﬁxed as illustrated in the right panel where we

assume that more of a proportional increase in the two outputs is strived at. Of course, in reality, one

can combine also changes in the inputs and outputs.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................................................................................................................................................................................................................................................................................

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

Input 1

Input 2

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

....................................................................................

•

•

x

ˆx=x−edx

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.−dx

Input side

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................................................................................................................................................................................................................................................................................

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

Output 1

Output 2

0.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......................................................................................

•

•

y

ˆy=y+edy

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

dy

Output side

Figure 1: Directional search for an alternative production plan to (x,y) along (dx,dy) using (DIR)

Formally, the directional distance function approach, sometimes referred to as the excess problem,

requires solving the following mathematical programming problem

max {e|(x−edx,y+edy)∈T∗},(DIR)

where xand yare the present values of the inputs and output vectors, dxand dyare the improvement

directions in input and output space, T∗is the estimated set of feasible (input,output) combinations,

and eis the magnitude of the movement.

In the DEA literature, the direction (dx,dy) is often thought as parameters that are given and the

excess as one of many possible ways to measure distance to the frontier. A few authors have advocated

that some directions are more natural than others and there have been attempts to endogenize the choice

of this direction, cf. e.g. Bogetoft and Hougaard (1999); F¨are et al. (2013, 2017); Petersen (2018); Zoﬁo

et al. (2013). One can also think of the improvement directions as reﬂecting the underlying strategy of

the DMU or simply as a steering tool that the DMU uses to create one or more interesting points on the

frontier.

Figure 2 illustrates the real-world example involving bank branches from the application section.

The analysis is here done using the directional distance function approach (DIR) as implemented in the

so-called Interactive Benchmarking software, cf. Bogetoft (2012). The search “Direction” is chosen by

adjusting the horizontal handles for each input and output and is expressed in percentages of the existing

inputs and outputs. The resulting best practice alternative is illustrated in the “Benchmark” column.

We see that the DMU in this example expresses an interest in reducing Supervision and Credit personnel

but simultaneously seeks to increase the number of personal loan accounts.

Applications of interactive benchmarking have typically been in settings where the DMU in a trial-

and-error like process seeks alternative production plans. Such processes can certainly be useful in

attempts to learn about and gain trust in the modelling, to guide decision making and to ﬁnd the perfo-

4

Figure 2: Directional search in Interactive Benchmarking software. Real-world dataset of bank branches

in Section 5

mance enhancing changes that a DMU may ﬁnd relatively easy to implement. From the point of view of

Multiple Criteria Decision Making (MCDM) we can think of such processes as based on progressive ar-

ticulation of preferences and alternatives, cf. e.g. the taxonomy of MCDM methods suggested in Rostami

et al. (2017).

It is clear from this small example, however, that the use of an interactive process guided solely by

the DMU may not always be the best approach. If there are more than a few inputs and outputs, the

process can become diﬃcult to steer towards some underlying optimal compromise between the many

possible changes in inputs and outputs. In such cases, the so-called prior articulation of preferences may

be more useful. If the DMU can express its preferences for diﬀerent changes, e.g., as a cost of change

function C((x,y),(x∗,y∗)) giving the cost of moving from the present production plan (x,y) to any

new production plan (x∗,y∗), then a systematic search for the optimal change is possible. The approach

of this paper is based on this idea. We consider a class of cost functions and show how to ﬁnd optimal

changes in inputs and outputs using bilevel optimization. In this sense, it corresponds to endogenizing

the directional choice so as to make the necessary changes in inputs and outputs as small as possible. Of

course, by varying the parameters of the cost function, one can also generate a reasonably representative

set of alternative production plans that the DMU can then choose from. This would correspond to the

idea of a prior articulation of alternatives approach in the MCDM taxonomy.

The idea of introducing a cost of change function to guide the search for alternative production plans

is closely related to the idea of targets in DEA. At a general level, a target is here understood as an

alternative production plan that a DMU should move to.1There has been a series of interesting DEA

papers on the determination of targets using the principle of least action, see for example Aparicio et al.

(2014) and the references in here. In Aparicio et al. (2014), the authors explicitly introduce the principle

of least action referring to the idea in physics that nature always ﬁnds the most eﬃcient course of action.

The general argument underlying these approaches is that an ineﬃcient ﬁrm should achieve technical

eﬃciency with a minimum amount of eﬀort. Diﬀerent solutions have been proposed using diﬀerent

distance measures or what we call the cost of change. In many papers, this corresponds to minimizing

the distance to the eﬃcient frontier in contrast to the traditional eﬃciency measurement problem, where

we are looking for the largest possible savings or the largest possible expansions of the services provided.

A good example is Aparicio and Pastor (2013). In this sense, our idea of ﬁnding close counterfactuals

1In Aparicio et al. (2007), the authors distinguish between setting targets and benchmarking in the sense that targets

are the coordinates of a projection point, which is not necessarily an observed DMU, whereas benchmarks are real observed

DMUs. This distinction can certainly be relevant is several contexts, but it is not how we use targets here. We use

benchmarking as the general term for relative performance comparison, and target to designate an alternative production

plan that a DMU should choose to improve performance in the easiest possible way.

5

ﬁts nicely into the DEA literature.

The choice of targets has also been discussed in connection with the slack problem in radial DEA

measures. A Farrell projection may not lead to a Pareto eﬃcient point and in a second stage, it is

therefore common to discuss close alternatives that are fully Pareto eﬃcient. Again, diﬀerent solutions

– using for example constraints or the determination of all full facets – have been proposed, cf. Aparicio

et al. (2017). Identifying all facets and measuring distance to these like in Silva Portela et al. (2003) is

theoretically attractive but computationally cumbersome in most applications.

Our approach can be seen as a generalization of the literature on targets. In particular, if E∗= 1

and the considered cost function coincides with a norm or with a typical technical eﬃciency measure,

then the previous DEA target approaches are particular cases of the general approach introduced in

this paper. We formulate the target setting problem for general cost-of-change functions using a bilevel

program2, and we reformulate the constraints to get tractable mathematical optimization problems.

Using combinations of ℓ0,ℓ1and ℓ2norms, as we do in the illustrations, the resulting problems are

Mixed Integer Convex Quadratic Problems with linear constraints. It is worthwhile to note also, that

we do not necessarily require the target to be Pareto eﬃcient, allowing for the possibility that a DMU

may not seek to become fully eﬃcient but, for example, just 90% Farrell eﬃcient which also implies that

targets may be on non-full facets.

In interpretable machine learning (Du et al., 2019; Rudin et al., 2022), counterfactual analysis is used

to explain the predictions made for individual instances (Guidotti, 2022; Karimi et al., 2022; Wachter

et al., 2017). Machine learning approaches like Deep Learning, Random Forests, Support Vector Ma-

chines, and XGBoost are often seen as powerful tools in terms of learning accuracy but also as black

boxes in terms of how the model arrives at its outcome. Therefore, regulations from, among others the

EU, are enforcing more transparency in the so-called ﬁeld of algorithmic decision making (European

Commission, 2020; Goodman and Flaxman, 2017). There is a paramount of tools being developed in the

nascent ﬁeld of explainable artiﬁcial intelligence to help understand how tools in machine learning and

artiﬁcial intelligence make decisions (Lundberg and Lee, 2017; Martens and Provost, 2014; Molnar et al.,

2020). The focus of this paper is on counterfactual analysis tools. The starting point is an individual

instance for which the model predicts an undesired outcome. In counterfactual analysis, one is interested

in building an alternative instance, the so-called counterfactual instance, revealing how to change the

features of the current instance so that the model predicts a desired outcome for the counterfactual

instance. The counterfactual explanation problem is written as a mathematical optimization problem.

To deﬁne the problem, one needs to model the feasible space, a cost function measuring the cost of the

movement from the current instance to the counterfactual one, and a set of constraints that ensures that

the counterfactual explanation is predicted with the desired outcome. In general, the counterfactual

explanation problem reads as a constrained nonlinear problem but, for score-based classiﬁers and cost

functions deﬁned by a convex combination of the norms ℓ0,ℓ1and ℓ2, equivalent Mixed Integer Linear

Programming or Mixed Integer Convex Quadratic with Linear Constraints formulations can be deﬁned,

see, e.g., Carrizosa et al. (2024); Fischetti and Jo (2018); Parmentier and Vidal (2021).

In the following, we combine the ideas of DEA, least action targets, and counterfactual explanations.

We formulate and solve bilevel optimization models to determine “close” alternative production plans

or counterfactual explanations in DEA models that lead to desired relative performance levels and also

take into account the strategic preferences of the entity.

2The idea of using a bilevel linear programming approach has also appeared in the DEA literature. It should be noted

in particular that Aparicio et al. (2017) proposed to resort to a bilevel linear programming model when strictly eﬃcient

targets are to be identiﬁed using the Russel output measure to capture cost-of-change.

6

3 The Setting

We consider K+ 1 DMUs (indexed by k), using Iinputs, xk= (xk

1, . . . , xk

I)⊤∈RI

+, to produce O

outputs, yk= (yk

1, . . . , yk

O)⊤∈RO

+. Hereafter, we will write (xk,yk) to refer to production plan of DMU

k,k= 0,1, . . . , K.

Let Tbe the technology set, with

T={(x,y)∈RI

+×RO

+|xcan produce y}.

We will initially estimate Tby the classical DEA model. It determines the empirical reference technology

T∗as the smallest subset of RI

+×RO

+that contains the actual K+1 observations, and satisﬁes the classical

DEA regularities of convexity, free-disposability in inputs and outputs, and Constant Returns to Scale

(CRS). It is easy to see that the estimated technology can be described as:

T∗(CRS) = {(x,y)∈RI

+×RO

+| ∃λ∈RK+1

+:x≥

K

X

k=0

λkxk,y≤

K

X

k=0

λkyk}.

To measure the eﬃciency of a ﬁrm, we will initially use the so-called Farrell input-oriented eﬃciency.

It measures the eﬃciency of a DMU, say DMU 0, as the largest proportional reduction E0of all its

inputs x0that allows the production of its present outputs y0in the technology T∗. Hence, it is equal

to the optimal solution value of the following LP formulation

min

E,λ0,...,λKE(DEA)

s.t. Ex0≥

K

X

k=0

λkxk

y0≤

K

X

k=0

λkyk

0≤E≤1

λ∈RK+1

+.

This DEA model has K+ 2 decision variables, Ilinear input constraints and Olinear output constraints.

Hereafter, we will refer to the optimal objective value of (DEA), say E0, as the eﬃciency of DMU 0.

In the following, and assuming that ﬁrm 0 with production plan (x0,y0) is not fully eﬃcient, E0<1,

we will show how to calculate a counterfactual explanation with a desired eﬃciency level E∗> E0,

i.e., the minimum changes needed in the inputs of the ﬁrm, x0, in order to obtain an eﬃciency E∗.

Given a cost function C(x0,ˆ

x) that measures the cost of moving from the present inputs x0to the new

counterfactual inputs ˆ

x, and a set X(x0) deﬁning the feasible space for ˆ

x, the counterfactual explanation

for x0is found solving the following optimization problem:

min

ˆ

xC(x0,ˆ

x)

s.t. ˆ

x∈ X (x0)

(ˆ

x,y0) has at least an eﬃciency of E∗.

With respect to C(x0,ˆ

x), diﬀerent norms can be used to measure the diﬃculty of changing the inputs.

7

A DMU may, for example, be interested to minimize the sum of the squared deviations between the

present and the counterfactual inputs. We model this using the squared Euclidean norm ℓ2

2. Likewise,

there may be an interest in minimizing the absolute value of the deviations, which we can proxy using

the ℓ1norm, or the number of inputs changed, which we can capture with the ℓ0norm. When it comes

to X(x0), this would include the nonnegativity of ˆ

x, as well as domain knowledge speciﬁc constraints.

With this approach, we detect the most important inputs in terms of the impact they have on the DMU’s

eﬃciency, and with enough ﬂexibility to consider diﬀerent costs of changing depending on the DMU’s

characteristics.

In the next section, we will show that ﬁnding counterfactual explanations involves solving a bilevel

optimization problem of minimizing the changes in inputs and solving the above DEA problem at the

same time. In the Appendix, we will also discuss how the counterfactual analysis approach can be

extended to other technologies and to other eﬃciency measures like the output-oriented Farrell eﬃciency

and other DEA technologies.

Before turning to the details of the bilevel optimization problem, it is useful to illustrate the idea

of counterfactual explanations using a small numerical example. Suppose we have four ﬁrms with the

inputs, outputs, and Farrell input eﬃciencies as in Table 1. The eﬃciency has been calculated solving

the classical DEA model with CRS, namely (DEA). In this example, ﬁrms 1 and 2 are fully eﬃcient,

whereas ﬁrms 3 and 4 are not.

Firm x1x2y E

1 0.50 1 1 1

2 1.50 0.50 1 1

3 1.75 1.25 1 0.59

4 2.50 1.25 1 0.50

Table 1: Inputs, outputs and corresponding Farrell input-eﬃciency of 4 diﬀerent ﬁrms

First, we want to know the changes needed in x3for ﬁrm 3 to have a new eﬃciency E∗of at least

80%. Since we only have two inputs, we can illustrate this graphically as in Figure 3a. The results are

shown in Table 2 for diﬀerent cost functions. It can be seen that we in all cases get exactly 80% eﬃciency

with the new inputs. We see from column ℓ2

2that the Farrell solution is further away from the original

inputs than the counterfactual solution based on the Euclidean norm. To the extent that diﬃculties of

change is captured by the ℓ2

2norm, we can conclude that the Farrell solution is not ideal. Moreover,

in the Farrell solution one must by deﬁnition change both inputs, see column ℓ0. Using a cost function

combining the ℓ0norm and the squared Euclidean norm, denoted by ℓ0+ℓ2, one penalizes the number

of inputs changed. With this we detect the one input that should change in order to obtain a higher

eﬃciency, namely the second input. In contexts like negotiations with various input suppliers, it is often

more practical to focus negotiations on just one or a select few inputs, instead of dealing with all inputs

at the same time.

Cost function ˆx1ˆx2y E ℓ2

2ℓ0

Farrell 1.29 0.92 1 0.8 0.32 2

ℓ0+ℓ21.75 0.69 1 0.8 0.31 1

ℓ21.53 0.80 1 0.8 0.25 2

Table 2: Counterfactual explanations for ﬁrm 3 in Table 1 imposing E∗= 0.8 and diﬀerent cost functions

Let us now focus on ﬁrm 4 and again ﬁnd a counterfactual instance with at least 80% eﬃciency. The

results are shown in Table 3 and Figure 3b. Notice how in the Farrell case one obtains again the farthest

8

solution and also the least sparse from the three of them. As for the counterfactual explanations with

our methodology, the inputs nearest to the original DMU that give us the desired eﬃciency are in a

non full-facet of the eﬃciency frontier. By using the Farrell to measure the desired eﬃciency level, we

only need to change one input, namely, the second input, and can have “slack” in the ﬁrst input. We

here deviate from Aparicio et al. (2017), in which the authors look for targets on the strongly eﬃcient

frontier, i.e., without slack.

Cost function ˆx1ˆx2yEℓ2

2ℓ0

Farrell 1.56 0.78 1 0.8 1.10 2

ℓ0+ℓ22.50 0.63 1 0.8 0.39 1

ℓ22.50 0.63 1 0.8 0.39 1

Table 3: Counterfactual explanations for ﬁrm 4 in Table 1 imposing E∗= 0.8 and diﬀerent cost functions

(a) Explanations for ﬁrm 3 (b) Explanations for ﬁrm 4

Figure 3: Counterfactual explanations for ﬁrms 3 and 4 in Tables 2 and 3 respectively imposing E∗= 0.8

and diﬀerent cost functions

In Figure 3, the space where we search for the counterfactual explanation is shaded. Although in

these illustrations the frontier is explicitly given, in general, the frontier points are convex combinations

of the observed DMUs, and obtaining the isoquants is not easy. Therefore, when ﬁnding counterfactual

explanations a bilevel optimization is needed to search for “close” inputs in the complement of a convex

set.

4 Bilevel optimization for counterfactual analysis in DEA

Suppose DMU 0 is not fully eﬃcient, i.e., the optimal objective value of Problem (DEA) is E0<1.

In this section, we formulate the counterfactual explanation problem in DEA, i.e., the problem that

calculates the minimum cost changes in the inputs x0that make DMU 0 have a higher eﬃciency. Let ˆ

x

be the new inputs of DMU 0 that would make it at least E∗eﬃcient, with E∗> E0. With this, we have

deﬁned the counterfactual instance as the one obtained changing the inputs, but in the same sense, we

could deﬁne it by changing the outputs. This alternative output-based problem will be studied in the

Appendix.

9

Since the values of the inputs are to be changed, the eﬃciency of the new production plan (ˆ

x,y0)

has to be calculated using Problem (DEA). The counterfactual explanation problem in DEA reads as

follows:

min

ˆxC(x0,ˆx) (1)

s.t. ˆx∈RI

+(2)

E≥E∗(3)

E∈arg min

¯

E,λ0,...,λK

{¯

E:¯

Eˆ

x≥

K

X

k=0

λkxk,y0≤

K

X

k=0

λkyk,¯

E≥0,λ∈RK+1

+},(4)

where in the upper level problem in (1) we minimize the cost of changing the inputs for ﬁrm 0, x0,

to ˆ

x, ensuring nonnegativity of the inputs, as in constraint (2), and that the eﬃciency is at least E∗,

as in constraint (3). The lower level problem in (4) ensures that the eﬃciency of (ˆ

x,y0) is correctly

calculated. Therefore, as opposed to counterfactual analysis in interpretable machine learning, here we

are confronted with a bilevel optimization problem. Notice also that to calculate the eﬃciency in the

lower level problem in (4), the technology is already ﬁxed, and the new DMU (ˆ

x,y0) does not take part

in its calculation.

In what follows, we reformulate the bilevel optimization problem (1)-(4) as a single-level model, by

exploiting the optimality conditions for the lower-level problem. This can be done for convex lower-level

problems that satisfy Slater’s conditions, e.g., if our lower-level problem was linear. In our case, however,

not all the constraints are linear, since in (4) we have the product of decision variables ¯

Eˆ

x. To be able

to handle this, we deﬁne new decision variables, namely, F=1

Eand βk=λk

E, for k= 0, . . . , K. Thus,

(1)-(4) is equivalent to:

min

ˆx,F C(x0,ˆx)

s.t. ˆx∈RI

+

F≤F∗

F∈arg max

¯

F ,β

{¯

F:ˆ

x≥

K

X

k=0

βkxk,¯

Fy0≤

K

X

k=0

βkyk,¯

F≥0,β∈RK+1

+}.(5)

This equivalent bilevel opimization problem can now be reformulated as a single-level model. The

new lower-level problem in (5) can be seen as the ˆ

x-parametrized problem:

max

F,βF(6)

s.t. ˆ

x≥

K

X

k=0

βkxk(7)

Fy0≤

K

X

k=0

βkyk(8)

F≥0 (9)

β≥0.(10)

The Karush-Kuhn-Tucker (KKT) conditions, which include primal and dual feasibility, stationarity

10

and complementarity conditions, are necessary and suﬃcient to characterize an optimal solution. Thus,

we can replace problem (5) by its KKT conditions. Primal feasibility is given by (7)-(10). Dual feasibility

is given by:

γI,γO, δ, µ≥0,(11)

for the dual variables associated with constraints (7)-(10), where γI∈RI

+,γO∈RO

+,δ∈R+,µ∈RK+1

+.

The stationarity conditions are as follows:

γ⊤

Oy0−δ= 1 (12)

γ⊤

Ixk−γ⊤

Oyk−µk= 0 k= 0, . . . , K. (13)

Lastly, we need the complementarity conditions for all constraints (7)-(10). For constraint (7), we have:

γi

I= 0 or ˆxi−

K

X

k=0

βkxk

i= 0 i= 1, . . . , I. (14)

In order to model this disjunction, we will introduce binary variables ui∈ {0,1},i= 1, . . . , I, and the

following constraints using the big-M method:

γi

I≤MIui,ˆxi−

K

X

k=0

βkxk

i≤MI(1 −ui), i = 1, . . . , I, (15)

where MIis a suﬃciently large constant.

The same can be done for the complementarity condition for constraint (8), introducing binary

variables vo∈ {0,1},o= 1, . . . , O, big-M constant MO, and constraints:

γo

O≤MOvo,−F y0

o+

K

X

k=0

βkyk

o≤MO(1 −vo), o = 1, . . . , O. (16)

The complementarity condition for constraint (10) would be the disjunction βk= 0 or µk= 0. Using the

stationarity condition (13) and again the big-M method with binary variables wk∈ {0,1},k= 0, . . . , K,

and big-M constant Mf, one obtains the constraints:

βk≤Mfwk,γ⊤

Ixk−γ⊤

Oyk≤Mf(1 −wk), k = 0, . . . , K. (17)

Finally, for constraint (9) the complementarity condition yields F= 0 or δ= 0. Remember that

F= 1/E, 0 ≤E≤1, thus Fcannot be zero by deﬁnition and we must impose δ= 0. Using stationarity

condition (12), this yields:

γ⊤

Oy0= 1.(18)

We now reﬂect on the meaning of these constraints. Notice that constraints (15) and (16) model

the slacks of the inputs and outputs respectively, while constraint (17) models the ﬁrms that deﬁne the

frontier, i.e., the ﬁrms with which DMU 0 is to be compared. If binary variable ui= 1, then there is no

slack in input i, i.e., ˆxi=PK

k=1 βkxk

i, whereas if ui= 0 that means there is. The same happens with

binary variable vo, namely, it indicates whether there is a slack in output o. On the other hand, when

wk= 1, then the equality of the dual constraint will hold γ⊤

Ixk=γ⊤

Oyk, i.e., ﬁrm kis fully eﬃcient and

11

it is used to deﬁne the eﬃciency of the counterfactual instance. If wk= 0 then βk= 0, and ﬁrm kis

not being used to deﬁne the eﬃciency of the counterfactual instance. Let us go back to the example in

the previous section with four ﬁrms with 2 inputs and 1 output and several choices of cost function C

of changing the inputs. When C=ℓ2

2, we can see that ﬁrm 3 is compared against ﬁrms 1 and 2, while

ﬁrm 4 is compared against ﬁrm 2 only.

Notice that µkis only present in (13), thus it is free. In addition, we know that δ= 0. Therefore, we

can transform the stationarity conditions (12) and (13) to

γ⊤

Oy0= 1 (19)

γ⊤

Ixk−γ⊤

Oyk≥0k= 1, . . . , K (20)

γI,γO≥0,(21)

that are exactly the constraints in the dual DEA model for the Farrell output eﬃciency.

The new reformulation of the counterfactual explanation problem in DEA is as follows:

min

ˆx,F,β,γI,γO,u,v,wC(x0,ˆx)

s.t. F≤F∗

ˆx∈RI

+

u,v,w∈ {0,1}

(7) −(10) primal

(19) −(21) dual

(15) −(16) slacks

(17) frontier.

So far, we have not been very speciﬁc about the objective function C(x0,ˆ

x). Diﬀerent functional

forms can be introduced, and this may require the introduction of further variables to implement these.

In Section 3, we approximated the ﬁrm’s cost-of-change using combinations of the ℓ0norm, the ℓ1

norm, and the squared ℓ2norm. They are widely used in machine learning when close counterfactuals

are sought in attempt to understand how to getter a more attractive outcome (Carrizosa et al., 2023).

The ℓ0“norm”, which strictly speaking is not a norm in the mathematical sense, counts the number of

dimensions that has to be changed. The ℓ1norm is the absolute value of the deviations. Lastly, ℓ2

2is the

Euclidean norm, that squares the deviations.

As a starting point, we therefore propose the following objective function:

C(x0,ˆ

x) = ν0∥x0−ˆ

x∥0+ν1∥x0−ˆ

x∥1+ν2∥x0−ˆ

x∥2

2,(22)

where ν0, ν1, ν2≥0. Taking into account that there may be speciﬁc product input prices and output

prices or that inputs may have varying degrees of diﬃculty to be changed, one can consider giving dif-

ferent weights to the deviations in each of the inputs.

In order to have a smooth expression of objective function (22), additional decision variables and

constraints have to be added to the counterfactual explanation problem in DEA. To linearize the ℓ0

norm, binary decision variables ξiare introduced. For input i,ξi= 1 models x0

i= ˆxi,i= 1, . . . , I . Using

12

the big-M method the following constraints are added to our formulation:

−Mzeroξi≤x0

i−ˆxi≤Mzeroξi, i = 1, . . . , I (23)

ξi∈ {0,1}, i = 1, . . . , I, (24)

where Mzero is a suﬃciently large constant.

For the ℓ1norm we introduce continuous decision variables ηi≥0, i= 1, . . . , I , to measure the abso-

lute values of the deviations, ηi=

x0

i−ˆxi

, which is naturally implemented by the following constraints:

ηi≥x0

i−ˆxi, i = 1, . . . , I (25)

−ηi≤x0

i−ˆxi, i = 1, . . . , I (26)

ηi≥0, i = 1, . . . , I. (27)

Thus, the counterfactual explanation problem in DEA with cost function Cin (22), hereafter (CEDEA),

reads as follows:

min

ˆx,F,β,γI,γO,u,v,w,η,ξν0

I

X

i=1

ξi+ν1

I

X

i=1

ηi+ν2

I

X

i=1

η2

i(CEDEA)

s.t. F≤F∗

ˆx∈RI

+

u,v,w∈ {0,1}

(7) −(10),(15) −(17),(19) −(21),

(23) −(24),(25) −(27).

Notice that in Problem (CEDEA) we assumed X(x0) = RI

+as the feasible space for ˆx. Other relevant

constraints for the counterfactual inputs could easily be added, e.g., bounds or relative bounds on the

inputs, or inputs that cannot be changed in the short run, say capital expenses, or that represent

environmental conditions beyond the control of the DMU.

In the case where only the ℓ0and ℓ1norms are considered, i.e., ν2= 0, the objective function as well

as the constraints are linear, while we have both binary and continuous decision variables. Therefore,

Problem (CEDEA) can be solved using an Mixed Integer Linear Programming (MILP) solver. Otherwise,

when ν2= 0, Problem (CEDEA) is a Mixed Integer Convex Quadratic model with linear constraints,

which can be solved with standard optimization packages. When all three norms are used, Problem

(CEDEA) has 3I+K+2+Ocontinuous variables and 2I+O+K+ 1 binary decision variables. It

has 7I+ 3O+ 3K+ 5 constraints, plus the non-negativity and binary nature of the variables. The

computational experiments show that this problem can be solved eﬃciently for our real-world dataset.

We can think of the objective function Cin diﬀerent ways.

One possibility is to see it as an instrument to explore the production possibilities. The use of a

combinations of the ℓ0,ℓ1and ℓ2norms seems natural here. Possible extensions could involve other ℓp

norms, ∥x0−ˆ

x∥p:= PI

i=1

x0

i−ˆxi

p1/p. For all p∈[1,∞), ℓpis convex. This makes the use of ℓp

norms convenient in generalizations of Problem (CEDEA). Of course, arbitrary ℓpnorms may lead to

more complicated implementations in existing softwares since the objective function may no longer be

quadratic.

Closely related to the instrumental view of the objective function is the idea of approximations. At

13

least as a reasonable initial approximation of more complicated functions, many objective functions C

can be approximated by the form in (22).

To end, one can link the form of Ccloser to economic theory. In the economic literature there have

been many studies on factor adjustments costs. It is commonly believed that ﬁrms change their demand

for inputs only gradually and with some delay, cf. e.g. Hamermesh and Pfann (1999). For labor inputs,

the factor adjustment costs include disruptions to production occurring when changing employment

causes workers’ assignments to be rearranged. Laying oﬀ or hiring new workers is also costly. There

are search costs (advertising, screening, and processing new employees); the cost of training (including

disruptions to production as previously trained workers’ time is devoted to on-the-job instruction of new

workers); severance pay (mandated and otherwise); and the overhead cost of maintaining that part of

the personnel function dealing with recruitment and worker outﬂows. Following again Hamermesh and

Pfann (1999), the literature on both labor and capital goods adjustments has overwhelmingly relied on

one form of C, namely that of symmetric convex adjustment costs much like we use in (22). Indeed,

in the case of only one production factor, the most widely used function form is simply the quadratic

one. Hall Hall (2004) and several others have tried to estimate the costs of adjusting labor and capital

inputs. Using a Cobb-Douglas production function, and absent adjustment costs and absent changes

in the ratios of factor prices, an increase in demand or in another determinant of industry equilibrium

would cause factor inputs to change in the same proportion as outputs. Adjustment costs are introduced

as reductions in outputs and are assumed to depend on the squared growth rates in labor and capital

inputs - the larger the percentage change, the larger the adjustment costs. Another economic approach

to the cost-of-change modelling is to think of habits. In ﬁrms - as in the private life - habits are useful.

In the performance of many tasks, including complicated ones, it is easiest to go into automatic mode

and let a behavior unfold. When an eﬃciency requirement is introduced, habits may need to change

and this is costly. The relevance and strength of habit formation has also been studied empirically using

panel data, cf. e.g. Dynan (2000) and the references herein. Habit formation should ideally be considered

in a dynamic framework. To keep it simple, we might consider two periods - the past, where x0was

used and the present, where ˆ

xis consumed. The utility in period two will then typically depend on the

diﬀerence or ratio of present to past consumption, ˆ

x−x0or, in the unidimensional case, ˆ

x/x0. Examples

of functional forms one can use are provided in for example Fuhrer (2000).

5 A banking application

In this section, we illustrate our methodology using real-world data on bank branches, Schaﬀnit et al.

(1997), by constructing a collection of counterfactual explanations for each of the ineﬃcient ﬁrms that

can help them learn about the DEA benchmarking model and how they can improve their eﬃciency.

The data is described in more detail in Section 5.1, where we consider a model of bank branch

production with I= 5 inputs and O= 3 outputs, and thus a production possibility set in R8

+, spanned

by K+1 = 267 ﬁrms. In Section 5.2, we will focus on changing the inputs, and therefore the counterfactual

explanations will be obtained with Problem (CEDEA). We will discuss the results obtained with diﬀerent

cost of change functions C, reﬂecting the eﬀort an ineﬃcient ﬁrm will need to spend to change to its

counterfactual instance, and diﬀerent desired levels of eﬃciency E∗. The Farrell projection discussed

in Section 3 is added for reference. The counterfactual analysis sheds light on the nature of the DEA

benchmarking model, which is otherwise hard to comprehend because of the many ﬁrms and inputs and

outputs involved in the construction of the technology.

14

All optimization models have been implemented using Python 3.8 and as solver Gurobi 9.0 (Gurobi Op-

timization, 2021). We have solved Problem (CEDEA) with MI=M0=Mf= 1000 and Mzero = 1. The

validity of Mzero = 1 will be shown below. Our numerical experiments have been conducted on a PC,

with an Intel R CoreTM i7-1065G7 CPU @ 1.30GHz 1.50 GHz processor and 16 gigabytes RAM. The

operating system is 64 bits.

5.1 The data

The data consist of ﬁve staﬀ categories and three diﬀerent types of outputs in the Ontario branches of

a large Canadian bank. The inputs are measured as full-time equivalents (FTEs), and the outputs are

the average monthly counts of the diﬀerent transactions and maintenance activities. Observations with

input values equal to 0 are removed, leaving us with an actual dataset with 267 branches. Summary

statistics are provided in Table 4.

Mean Min Max Std. dev.

INPUTS

Teller 5.83 0.49 39.74 3.80

Typing 1.05 0.03 22.92 1.84

Accounting & ledgers 4.69 0.80 65.93 5.13

Supervision 2.05 0.43 38.29 2.66

Credit 4.40 0.35 55.73 6.19

OUTPUTS

Term accounts 2788 336 22910 2222

Personal loan accounts 117 0 1192 251

Commercial loan accounts 858 104 8689 784

Table 4: Descriptive statistics of the Canadian bank branches dataset in Schaﬀnit et al. (1997)

After calculating all the eﬃciencies through Problem (DEA), one has that 236 ﬁrms of the 267 ones

are ineﬃcient. Out of those, 219 ﬁrms have an eﬃciency below 90%, 186 below 80%, 144 below 70%, 89

below 60% and 49 below 50%.

5.2 Counterfactual analysis of bank branches

To examine the ineﬃcient ﬁrms, we will determine counterfactual explanations for these. Prior to that,

we have divided each input by its maximum value across all ﬁrms. We notice that this has no impact on

the solution since DEA models are invariant to linear transformations of inputs and outputs. Also, this

makes valid choosing Mzero = 1, since the values of all inputs are upper bounded by 1.

We will use three diﬀerent cost functions, by changing the values of the parameters ν0, ν1, ν2in (22),

as well as two diﬀerent values of the desired eﬃciency of the counterfactual instance, namely E∗= 1 and

0.8. In the ﬁrst implementation of the cost function, which we denote ℓ0+ (ℓ2), we use ν0= 1, ν2= 10−3

and ν1= 0, i.e., we will seek to minimize the ℓ0norm and only introduce a little bit of the squared

Euclidean norm to ensure a unique solution of Problem (CEDEA). In the second implementation, which

we call ℓ0+ℓ2, we take ν0= 1, ν1= 0 and ν2= 105, such that the squared Euclidean norm has a

higher weight than in cost function ℓ0+ (ℓ2). Finally, we denote by ℓ2the cost function that focuses

on the minimization of the squared Euclidean norm only, i.e., ν0=ν1= 0 and ν2= 1. The summary

of all the cost functions used can be seen in Table 5. Calculations were also done for the ℓ1norm, i.e.,

ν0=ν2= 0 and ν1= 1, but as the solutions found were similar to those for cost function ℓ0+ℓ2,

for the sake of clarity of presentation, they are omitted. We start the discussion of the counterfactual

explanations obtained with E∗= 1, as summarized in Figures 4-5 and Tables 6-7. We then move on to

15

Figure 4: Counterfactual Explanations for ﬁrm 238 with Problem (CEDEA) and desired eﬃciency E∗=

1.

a less demanding desired eﬃciency, namely, E∗= 0.8. These results are summarized in Figures 6-7 and

Tables 8-9.

Cost function ν0ν1ν2

ℓ0+ (ℓ2) 1 0 10−3

ℓ0+ℓ21 0 105

ℓ20 0 1

Table 5: Value of the parameters ν0, ν1and ν2in (22) for the diﬀerent cost functions used

Let us ﬁrst visualize the counterfactual explanations for a speciﬁc ﬁrm. Consider, for instance,

ﬁrm 238, which has an original eﬃciency of E0= 0.72. We can visualize the diﬀerent counterfactual

explanations generated by the diﬀerent cost functions using a spider chart, see Figure 4. In addition

to the counterfactual explanations obtained with Problem (CEDEA), we also illustrate the so-called

Farrell projection. In the spider chart, each axis represents an input and the original values of the ﬁrm

corresponds to the outer circle. Figure 4 shows the diﬀerent changes needed depending on the cost

function used. With the ℓ0+ (ℓ2), where the focus is to mainly penalize the number of inputs changed,

we see that only the typing personnel has to be changed, leaving the rest of the inputs unchanged.

Nevertheless, because only one input is changed, it has to be decreased by 60% from the original value.

The Farrell solution decreases the typing personnel by 28% of its value, but to compensate, it changes

the remaining four inputs proportionally. When the ℓ0+ℓ2cost function is used, the typing personnel

keeps on needing to be changed, but the change is smaller, this time by 51% of its value. The supervision

personnel needs also to be decreased by 16% of its value, while the rest of the inputs remain untouched.

Increasing the weight on the Euclidean norm in the cost function gives us the combination of the two

16

inputs that are crucial to change in order to gain eﬃciency, as well as the exact amount that they need

to be reduced. Finally, using only the Euclidean norm, the typing, supervision and credit personnel are

the inputs to be changed, the typing input is reduced slightly less than with the ℓ0+ℓ2in exchange of

reducing just by 1% the credit input. Notice that the teller and accounting and ledgers personnel are

never changed in the counterfactual explanations generated by our methodology, which leads us to think

that these inputs are not the ones leading ﬁrm 238 to have its original low eﬃciency.

The analysis above is for a single ﬁrm. We now present some statistics about the counterfactual

explanations obtained for all the ineﬃcient ﬁrms. Recall that these are 236 ﬁrms, and that Problem

(CEDEA) has be solved for each of them. In Table 6 we show for each cost function, how often an input

has to be changed. For instance, the value 0.09 in the last row of the Teller column shows that in 9% of

all ﬁrms, we have to change the number of tellers when the aim is to ﬁnd a counterfactual instance using

the Euclidean norm. When more weight is given to the ℓ0norm, few inputs are changed. Indeed, for the

Teller column, with the ℓ0+ℓ2, 3% of all ﬁrms change it, instead of 9%, and this number decreases to 1%

when the ℓ0+ (ℓ2) is used. The same pattern can be observed in all inputs, particularly notable in the

Acc. and Ledgers personnel, that goes from changing in more than half of the banks with the Euclidean

norm, to changing in only 14% of the ﬁrms. The last column of Table 6, Mean ℓ0(x−ˆx), shows how

many inputs are changed on average when we use the diﬀerent cost functions. With the ℓ0+ (ℓ2) only

one input has to be decreased, thus with this cost function one detects the crucial input to be modiﬁed

to be fully eﬃcient, leaving the rest ﬁxed. In general, the results show that, for the ineﬃcient ﬁrms,

the most common changes leading to full eﬃciency is to reduce the number of typists and the number

of credit oﬃcers. The excess of typists is likely related to the institutional setting. Bank branches need

the so-called typists for judicial regulations, but they only need the services to a limited degree, see also

Table 4. In such cases, it may be diﬃcult to match the full time equivalents employed precisely to the

need. The excess of Credit oﬃcers is more surprising since, in particular, they are one of the best paid

personnel groups.

In Table 7, we look at the size of the changes and not just if a change has to take place or not.

The interpretation of the value 0.43 under the ﬁrst row and the Teller column suggests that when

the teller numbers have to be changed, they are reduced by 43% from the initial value, on average.

Since several inputs may have to change simultaneously, deﬁning the vector of the relative changes

r= ((xi−ˆxi)/xi)I

i=1, the last column shows the mean value of the Euclidean norm of this vector. We

see, for example, that in the relatively few cases the teller personnel has to change under ℓ0+ (ℓ2), the

changes are relatively large. We see again the diﬃculties the bank branches apparently have hiring the

right amount of typists. We saw in Table 6 that they often have to change and we see now that the

changes are non-trivial with about a half excess full time equivalents.

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ0(x−ˆx)

ℓ0+ (ℓ2) 0.01 0.38 0.14 0.13 0.34 1.00

ℓ0+ℓ20.03 0.40 0.17 0.14 0.38 1.13

ℓ20.09 0.45 0.51 0.21 0.47 1.72

Table 6: Average results on how often the inputs (personnels) change when desired eﬃciency is E∗= 1.

In Figure 5, we use a heatmap to illustrate which input factors have to change for the individual ﬁrms

using the three diﬀerent cost functions in Table 6. Rows with no markings represent ﬁrms that were fully

eﬃcient to begin with. We see as we would expect that the more weight we put on the Euclidean norm,

the more densely populated the illustration becomes, i.e., the more inputs have to change simultaneously.

So far we have asked for counterfactual instances that are fully eﬃcient. If we instead only ask for the

17

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ2(r)

ℓ0+ (ℓ2) 0.43 0.61 0.41 0.43 0.37 0.4743

ℓ0+ℓ20.21 0.58 0.33 0.38 0.35 0.4742

ℓ20.11 0.53 0.14 0.27 0.29 0.4701

Table 7: Average results on how much the inputs (personnels) change when desired eﬃciency is E∗= 1.

(a) C=ℓ0+ (ℓ2) (b) C=ℓ0+ℓ2(c) C=ℓ2

Figure 5: The inputs that change when we impose a desired eﬃciency of E∗= 1

counterfactual instances to be at least 80% eﬃcient, only 186 ﬁrms need to be studied. As before, let us

ﬁrst visualize the counterfactual explanations for ﬁrm 238, which had an original eﬃciency of E0= 0.72.

In Figure 6, we can see the diﬀerent changes when imposing E∗= 0.8. We again see that the Farrell

approach reduces all inputs proportionally, speciﬁcally by 9.5% of their values. We see also that under

the ℓ0+ (ℓ2) norm, only Credit personnel has to be reduced, by 15%. Under the Euclidean norm, Teller

and Acc. and Ledgers personnel are not aﬀected while Typing, Supervision and Credit oﬃcers have to

be saved, by 4%, 13% and 7%, respectively. Notice that only the change in Supervision is higher in this

case than in the Farrell solution, while the decrease in the remain four inputs is signiﬁcantly smaller

for the Euclidean norm. Recall that in Figure 4 the counterfactual explanations for the same ﬁrm 238

have been calculated imposing E∗= 1. Altering the desired eﬃciency level from E∗= 0.8 to E∗= 1

leads to rather dramatic changes in the counterfactual explanations. For the ℓ0+ (ℓ2) cost function, for

a desired eﬃciency of E∗= 0.8, we needed to decrease the Credit personnel dramatically whereas for a

desired eﬃciency of E∗= 1, it is suggested to leave unchanged the Credit personnel and to change the

Typing personnel instead. On the other hand, what remains the same is the fact that Teller and Acc.

and Ledgers oﬃcers are never aﬀected in the counterfactual explanations with the three cost functions.

After the analysis for a single ﬁrm, now we present statistics about the counterfactual explanations

18

Figure 6: Counterfactual Explanations for DMU 238 with Problem (CEDEA) and desired eﬃciency

E∗= 0.8

obtained for all 168 ﬁrms that had an original eﬃciency below 80%. The frequency of changes and the

relative sizes of the changes are shown in Tables 8 and 9. We see, as we would expect, that the amount

of changes necessary is reduced. On the other hand, the inputs to be changed are not vastly diﬀerent.

The tendency to change in particular credit oﬃcers is slightly larger now.

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ0(x−ˆx)

ℓ0+ (ℓ2) 0.01 0.30 0.18 0.08 0.44 1.00

ℓ0+ℓ20.03 0.32 0.19 0.09 0.47 1.11

ℓ20.13 0.43 0.51 0.18 0.63 1.88

Table 8: Average results on how often the inputs (personnels) change when desired eﬃciency is E∗= 0.8

Cost function Teller Typing Acc. and Ledgers Supervision Credit Mean ℓ2(r)

ℓ0+ (ℓ2) 0.28 0.56 0.28 0.41 0.27 0,3708

ℓ0+ℓ20.12 0.52 0.25 0.39 0.26 0.3707

ℓ20.05 0.41 0.11 0.25 0.21 0.3702

Table 9: Average results on how much the inputs (personnels) change when desired eﬃciency is E∗= 0.8

In Figure 7, we show the input factors that need to change for the individual ﬁrms using the three

diﬀerent cost functions in Table 8 for the case now with E∗= 0.8. As expected, we can see now an

increasing number of rows with no markings compared to Figure 5, belonging to the ﬁrms that had

already an eﬃciency of 0.8.

19

(a) C=ℓ0+ (ℓ2) (b) C=ℓ0+ℓ2(c) C=ℓ2

Figure 7: The inputs that change when we impose a desired eﬃciency of E∗= 0.8

6 Conclusions

In this paper, we have proposed a collection of optimization models to setting targets and ﬁnding coun-

terfactual explanations in DEA models, i.e., the least costly changes in the inputs or outputs of a ﬁrm

that leads to a pre-speciﬁed (higher) eﬃciency level. With our methodology, we are able to include

diﬀerent ways to measure the proximity between a ﬁrm and its counterfactual, namely, using the ℓ0,

ℓ1, and ℓ2norms or a combination of them. Calculating counterfactual explanations involves ﬁnding

“close” alternatives in the complement of a convex set. We have reformulated this bilevel optimization

problem as either an MILP or a Mixed Integer Convex Quadratic Problem with linear constraints. In

our numerical section, we can see that for our banking application, we are able to solve this model to

optimality.

DEA models can capture very complex relationships between multiple inputs and outputs. This

allows more substantial evaluations and also oﬀers a framework that can support many operational,

tactical and strategic planning eﬀorts. However, there is also a risk that such a model is seen as a pure

black box which in turn can lead to mistrust and some degree of model or algorithm aversion. By looking

at counterfactuals, a ﬁrm can get a better understanding of the production space and is more likely to

trust in the modelling.

Counterfactuals in DEA can also help a ﬁrm choose which changes to implement. It is not always

enough to simply think of a strategy and which factors can easily be changed, say a direction in input

space. It is also important how the technology looks like and therefore how large such changes need to

be to get a desired improvement in eﬃciency. In this way, the analysis of close counterfactuals can help

endogenize the choice of both desirable and eﬀective directions to move in. By varying the parameters

20

of the cost function, the ﬁrm can even get a menu of counterfactuals, from which it can choose, having

thus more ﬂexibility and leading the evaluated ﬁrm to gain more trust in the underlying model.

Note also that by calculating the counterfactual explanations for all ﬁrms involved, as we did in our

banking application, one can determine which combinations of inputs and outputs that most commonly

shall be changed to improve eﬃciency. This is interesting from an overall system point of view. Society

at large - or for example a regulator tasked primarily with incentivizing natural monopolies to improve

eﬃciency - may not solely be interested that everyone becomes eﬃcient. It may as well be important

how the eﬃciency is improved, e.g. by reducing the use of imported or domestic resources or by laying

oﬀ some particular types of labor and not other types.

There are several interesting extensions that can be explored in future research. Here we just mention

two. One possibility is to use alternative eﬃciency measures to constrain the search for counterfactual

instances. We have here used Farrell eﬃciency, which is by far the most common eﬃciency measure

in DEA studies, but one might consider other alternative measures, e.g. additive ones like the excess

measure. Another relevant extension could be to make the counterfactuals less individualized. One

could for example look for the common features that counterfactual explanations should change across

all individual ﬁrms and that lead to the minimum total cost.

Acknowledgements

This research has been ﬁnanced in part by research projects EC H2020 MSCA RISE NeEDS (Grant

822214); COST Action CA19130 - FinAI; FQM-329, P18-FR-2369 and US-1381178 (Junta de Andaluc´ıa),

PID2019-110886RB-I00 and PID2022-137818OB-I00 (Ministerio de Ciencia, Innovaci´on y Universidades,

Spain), and Independent Research Fund Denmark (Grant 9038-00042A) - “Benchmarking-based incen-

tives and regulatory applications”. The support is gratefully acknowledged.

References

Agrell, P. and Bogetoft, P. (2017). Theory, techniques and applications of regulatory benchmarking and

productivity analysis. In Oxford Handbook of Productivity Analysis, pages 523–555. Oxford University

Press: Oxford.

Antle, R. and Bogetoft, P. (2019). Mix stickiness under asymmetric cost information. Management

Science, 65(6):2787–2812.

Aparicio, J., Cordero, J. M., and Pastor, J. T. (2017). The determination of the least distance to the

strongly eﬃcient frontier in data envelopment analysis oriented models: modelling and computational

aspects. Omega, 71:1–10.

Aparicio, J., Mahlberg, B., Pastor, J., and Sahoo, B. (2014). Decomposing technical ineﬃciency using

the principle of least action. European Journal of Operational Research, 239:776–785.

Aparicio, J. and Pastor, J. (2013). A well-deﬁned eﬃciency measure for dealing with closest targets in

DEA. Applied Mathematics and Computation, 219:9142–9154.

Aparicio, J., Ruiz, J. L., and Sirvent, I. (2007). Closest targets and minimum distance to the pareto-

eﬃcient frontier in DEA. Journal of Productivity Analysis, 28:209–218.

21

Bogetoft, P. (2012). Performance Benchmarking - Measuring and Managing Performance. Springer,

New York.

Bogetoft, P. and Hougaard, J. (1999). Eﬃciency evaluation based on potential (non-proportional) im-

provements. Journal of Productivity Analysis, 12:233–247.

Bogetoft, P. and Otto, L. (2011). Benchmarking with DEA, SFA, and R. Springer, New York.

Carrizosa, E., Ram´ırez Ayerbe, J., and Romero Morales, D. (2023). Mathematical optimization mod-

elling for group counterfactual explanations. Technical report, IMUS, Sevilla, Spain, https://www.researchgate.

net/publication/368958766_Mathematical_Optimization_Modelling_for_Group_Counterfactual_Explanations.

Carrizosa, E., Ram´ırez Ayerbe, J., and Romero Morales, D. (2024). Generating collective counterfac-

tual explanations in score-based classiﬁcation via mathematical optimization. Expert Systems With

Applications, 238:121954.

Charnes, A., Cooper, W. W., Lewin, A. Y., and Seiford, L. M. (1995). Data Envelopment Analysis:

Theory, Methodology and Applications. Kluwer Academic Publishers, Boston, USA.

Charnes, A., Cooper, W. W., and Rhodes, E. (1978). Measuring the eﬃciency of decision making units.

European Journal of Operational Research, 2:429–444.

Charnes, A., Cooper, W. W., and Rhodes, E. (1979). Short Communication: Measuring the Eﬃciency

of Decision Making Units. European Journal of Operational Research, 3:339.

Cherchye, L., Rock, B. D., Dierynck, B., Roodhooft, F., and Sabbe, J. (2013). Opening the “black box”

of eﬃciency measurement: Input allocation in multioutput settings. Operations Research, 61(5):1148–

1165.

Du, M., Liu, N., and Hu, X. (2019). Techniques for interpretable machine learning. Communications of

the ACM, 63(1):68–77.

Dynan, K. (2000). Habit formation in consumer preferences: Evidence from panel data. American

Economic Review, 90(3):391–406.

European Commission (2020). White Paper on Artiﬁcial Intelligence : a European approach to excellence

and trust.https://ec.europa.eu/info/publications/white-paper- artificial-intelligence- european- approach-excellence- and-trust_en.

F¨are, R. and Grosskopf, S. (2000). Network DEA. Socio-Economic Planning Sciences, 34(1):35–49.

F¨are, R., Grosskopf, S., and Whittaker, G. (2013). Directional output distance functions: endogenous

directions based on exogenous normalization constraints. Journal of Productivity Analysis, 40:267–269.

F¨are, R., Pasurkac, C., and M.Vardanyan (2017). On endogenizing direction vectors in parametric

directional distance function-based models. European Journal of Operational Research, 262:361–369.

Fischetti, M. and Jo, J. (2018). Deep neural networks and mixed integer linear optimization. Constraints,

23(3):296–309.

Fuhrer, J. C. (2000). Habit formation in consumption and its implications for monetary-policy models.

The American Economic Review, 90(4):367–390.

22

Goodman, B. and Flaxman, S. (2017). European Union regulations on algorithmic decision-making and

a “right to explanation”. AI Magazine, 38(3):50–57.

Guidotti, R. (2022). Counterfactual explanations and how to ﬁnd them: literature review and bench-

marking. Forthcoming in Data Mining and Knowledge Discovery.

Gurobi Optimization, L. (2021). Gurobi optimizer reference manual.

Hall, R. (2004). Measuring factor adjustment costs. The Quarterly Journal of Economics, 119(3):899–

927.

Hamermesh, D. S. and Pfann, G. A. (1999). Adjustment costs in factor demand. Journal of Economic

Literature, 34(3):1264–1292.

Haney, A. and Pollitt, M. (2009). Eﬃciency analysis of energy networks: An international survey of

regulators. Energy Policy, 37(12):5814–5830.

Kao, C. (2009). Eﬃciency decomposition in network data envelopment analysis: A relational model.

European Journal of Operational Research, 192:949–962.

Karimi, A.-H., Barthe, G., Sch¨olkopf, B., and Valera, I. (2022). A survey of algorithmic recourse:

contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5):1–29.

Lundberg, S. and Lee, S.-I. (2017). A uniﬁed approach to interpreting model predictions. In Advances

in Neural Information Processing Systems, pages 4765–4774.

Martens, D. and Provost, F. (2014). Explaining data-driven document classiﬁcations. MIS Quarterly,

38(1):73–99.

Molnar, C., Casalicchio, G., and Bischl, B. (2020). Interpretable machine learning–a brief history,

state-of-the-art and challenges. In Joint European Conference on Machine Learning and Knowledge

Discovery in Databases, pages 417–431. Springer.

Parmentier, A. and Vidal, T. (2021). Optimal counterfactual explanations in tree ensembles. In Inter-

national Conference on Machine Learning, pages 8422–8431. PMLR.

Parmeter, C. and Zelenyuk, V. (2019). Combining the virtues of stochastic frontier and data envelopment

analysis. Operations Research, 67(6):1628–1658.

Petersen, N. (2018). Directional Distance Functions in DEA with Optimal Endogenous Directions.

Operations Research, 66(4):1068–1085.

Rigby, D. (2015). Management tools 2015 - an executive’s guide. Bain & Company.

Rigby, D. and Bilodeau, B. (2015). Management tools and trends 2015. Bain & Company.

Rostami, S., Neri, F., and Epitropakis, M. (2017). Progressive preference articulation for decision making

in multi-objective optimisation problems. Integrated Computer-Aided Engineering, 24(4):315–335.

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., and Zhong, C. (2022). Interpretable machine

learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16:1–85.

Schaﬀnit, C., Rosen, D., and Paradi, J. (1997). Best practice analysis of bank branches: An application

of DEA in a large Canadian bank. European Journal of Operational Research, 98(2):269–289.

23

Silva Portela, M., Borges, P., and Thanassoulis, E. (2003). Finding closest targets in non-oriented DEA

models: The case of convex and non-convex technologies. Journal of Productivity Analysis, 19:251–269.

Thach, P. (1988). The design centering problem as a DC programming problem. Mathematical Program-

ming, 41(1):229–248.

Wachter, S., Mittelstadt, B., and Russell, C. (2017). Counterfactual explanations without opening the

black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31:841–887.

Zhu, J. (2016). Data Envelopment Analysis - A Handbook on Models and Methods. Springer New York.

Zoﬁo, J. L., Pastor, J. T., and Aparicio, J. (2013). The directional proﬁt eﬃciency measure: on why

proﬁt ineﬃciency is either technical or allocative. Journal of Productivity Analysis, 40:257–266.

Appendix

Here, we extend the analysis in Section 4 by investigating alternative returns to scale and by investigating

changes in the outputs rather than the inputs. In both extensions, we will consider a combination of the

ℓ0,ℓ1and ℓ2norms as in objective function (22).

A Changing the returns to scale

In Section 4, we have considered the DEA model with constant return to scale (CRS), where the only

requirement on the values of λis that they are positive, i.e., λ∈RK+1

+, but we could consider other

technologies. In that case, to be able to transform our bilevel optimization problem to a single-level one,

we should take into account that for each new constraint derived from the conditions on λ, a new dual

variable has to be introduced. We will consider the varying return to scale (VRS) model as it is one of

the most preferred one by ﬁrms (Bogetoft, 2012), but extensions to other models are analogous.

Consider the input case. With the same transformation as before, we have:

min

ˆx,F C(x0,ˆx)

s.t. ˆx∈RI

+

F≤F∗

F∈arg min

¯

F ,λ0,...,λK

{¯

F:ˆx≥

K

X

k=0

βkxk,ˆ

Fy0≤

K

X

k=0

βkyk,¯

F≥0,β∈RK+1

+,

K

X

k=0

βk=F}.

Notice that the only diﬀerence is that we have a new constraint associated with the technology,

namely, PK

k=0 βk=F. Let κ≥0 be the new dual variable associated with this constraint. Then, the

following changes are made in constraints (19) and (20):

γ⊤

Oy0+κ= 1 (30)

γ⊤

Ixk−γ⊤

Oyk−κ≥0k= 0, . . . , K. (31)

The single-level formulation for the counterfactual problem for VRS DEA is as follows:

min

ˆx,F,β,γI,γO,u,v,w,κ,η,ξν0

I

X

i=1

ξi+ν1

I

X

i=1

ηi+ν2

I

X

i=1

(x0

i−ˆxi)2(CEVDEA)

24

s.t. F≤F∗

K

X

k=0

βk=F

ˆx∈RI

+

κ≥0

u,v,w∈ {0,1}

(7) −(10),(15) −(17)

(21),(23) −(31).

B Changing the outputs

We have calculated the counterfactual instance of a ﬁrm as the mininum cost changes in the inputs in

order to have a better eﬃciency. In the same vein, we could consider instead changes in the outputs,

leaving the same inputs. Again, suppose ﬁrm 0 is not fully eﬃcient, E0<1. Now, we are interested in

calculating the minimum changes in the outputs y0that make it to have a higher eﬃciency E∗> E0.

Let ˆ

ybe the new outputs of ﬁrm 0 that make it to be at least E∗eﬃcient. We have, then, the following

bilevel optimization problem:

min

ˆy,E C(y0,ˆy)

s.t. ˆy∈RO

+

E≥E∗

E∈arg min

¯

E,λ0,...,λK

{¯

E:¯

Ex0≥

K

X

k=0

λkxk,ˆ

y≤

K

X

k=0

λkyk,¯

E≥0,λ∈RK+1

+}.

Following similar steps as in previous section, the single-level formulation for the counterfactual

problem in DEA in the output case is as follows:

min

ˆy,E ,λ,γI,γO,u,v,w,η,ξν0

O

X

o=1

ξo+ν1

O

X

o=1

ηo+ν2

O

X

o=1

(y0

o−ˆyo)2(CEODEA)

s.t. ˆy∈RO

+

E≥E∗

Ex0≥

K

X

k=0

λkxk

ˆy≤

K

X

k=0

λkyk

γ⊤

Ix0= 1

γ⊤

Oyk−γ⊤

Ixk≤0k= 0, . . . , K

γi

I≤MIuii= 1, . . . , I

Ex0

i−

K

X

k=0

λkxk

i≤MI(1 −ui)i= 1, . . . , I

25

γo

O≤MOvoo= 1, . . . , O

−ˆyo+

K

X

k=0

λkyk

o≤MO(1 −vo)o= 1, . . . , O

λk≤Mfwkk= 0, . . . , K

γ⊤

Oyk−γ⊤

Ixk≤Mf(1 −wk)k= 0, . . . , K

−Mzeroξo≤y0

o−ˆyoo= 1, . . . , O

y0

o−ˆyo≤Mzeroξoo= 1, . . . , O

ηo≥y0

o−ˆyoo= 1, . . . , O

−ηo≤y0

o−ˆyoo= 1, . . . , O

E, λ,γI,γO,η≥0

u,v,w,ξ∈ {0,1}.

As in the input model, depending on the cost function, we either obtain an MILP model or a Mixed

Integer Convex Quadratic model with linear constraints. This model could be formulated analogously

for the VRS case.

26