Article

Variable selection in Data Envelopment Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The selection of inputs and outputs in Data Envelopment Analysis (DEA) is regarded as an important step that is normally conducted before the DEA model is implemented. In this paper, we introduce cardinality constraints directly into the DEA program in order to select the relevant inputs and outputs automatically, without any previous statistical analysis, heuristic decision making or expert judgement (though our method is not incompatible with these other approaches and indeed may help to choose among them). The selection of variables is obtained solving a mixed integer linear program (MILP) which specifies the maximal number of variables to be used. The computational time of the program is fast in all practical situations. We explore the performance of the method via Monte Carlo simulations. Some empirical applications are considered in order to illustrate the usefulness of the method.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, an important issue that is still not resolved in the literature is how the input and output variables used in the model are selected. Since the selection of input and output variables is done before starting the analysis, every researcher wants to know how the input and output variables are selected (Peyrache et al., 2020). Therefore, in addition to the selection of the DEA model, the selection of appropriate input and output variables is an important part that researchers should keep in mind (Luo et al. (2012). ...
... The statistical significance of input and output variables is assessed using statistical methods such as least significant difference (LSD) and Welch's statistics. Peyrache et al. (2020) introduce cardinality constraints directly into DEA programs to automatically select relevant inputs and outputs without prior statistical analysis, heuristic decision-making, or expert judgment. The choice of variables is obtained by solving a Mixed Integer Linear Program (MILP) specifying the maximum number of variables to use. ...
... However, an important issue that is still not resolved in the literature is how the input and output variables used in the model are selected. Since the selection of input and output variables is done before starting the analysis, every researcher wants to know how the input and output variables are selected (Peyrache et al., 2020). Therefore, in addition to the selection of the DEA model, the selection of appropriate input and output variables is an important part that researchers should keep in mind (Luo et al. (2012). ...
... The statistical significance of input and output variables is assessed using statistical methods such as least significant difference (LSD) and Welch's statistics. Peyrache et al. (2020) introduce cardinality constraints directly into DEA programs to automatically select relevant inputs and outputs without prior statistical analysis, heuristic decision-making, or expert judgment. The choice of variables is obtained by solving a Mixed Integer Linear Program (MILP) specifying the maximum number of variables to use. ...
... e.g. [4,18,21]. ...
... count the number of DMUs with an efficiency above E. Constraints(21) are now enforced only for the pairs of DMUs in the slice, removing the ties between DMUs that are not in this cluster. Constraints ...
Preprint
Full-text available
In modelling the relative performance of a set of Decision Making Units (DMUs), a common challenge is to account for heterogeneity in the services they provide and the settings in which they operate. One solution is to include many features in the model and hereby to use a one-fits-all model that is sufficiently complex to account for this heterogeneity. Another approach is to introduce several but simpler models for different clusters of the DMUs. In this paper, we investigate the joint problem of DMU clustering and feature selection. The goal is to find a small number of clusters of DMUs and the features that can be used in each cluster to maximize the average efficiency of the DMUs. We formulate this as Mixed Integer Linear Optimization problem and propose a collection of constructive heuristics based on different types of similarity between DMUs. The approach is used on a real-world dataset from the benchmarking of electricity Distribution System Operators, as well as on simulated data. We show that by introducing clusters we can considerably reduce the number of features necessary to get high efficiency levels.
... The selection of input and output variables in DEA is regarded as an important step that is normally conducted before the DEA model is implemented. Available techniques are, on the one hand, based on expert intervention, using heuristic decision-making, and expert judgement (e.g., using Delphi), and, on the other hand, fully automatic approaches [39] which in turn maximize efficiencies and lose discrimination power without a full understanding of the domain. There is a lack of data-based methodologies and use cases that avoid bias of experts and at the same time provide useful, repeatable, and interpretable results. ...
... Labor and energy consumption can The selection of inputs and outputs is especially relevant in this scenario due to the large number of available variables and the modest sample size. Following the cardinality constraints introduced in [39], the recommended number of variables for these two CRS models is 3 (in case of considering VRS it would be 2). The selected variables depend eventually on EDA on the available data; however, a tentative output is the number of passengers, and the models can be considered input-oriented, designed for minimizing inputs when moving a given number of people. ...
Article
Full-text available
This paper deals with the efficiency and sustainability of urban rail transit (URT) using exploratory data analytics (EDA) and data envelopment analysis (DEA). The first stage of the proposed methodology is EDA with already available indicators (e.g., the number of stations and passengers), and suggested indicators (e.g., weekly frequencies, link occupancy rates, and CO2 footprint per journey) to directly characterize the efficiency and sustainability of this transport mode. The second stage is to assess the efficiency of URT with two original models, based on a thorough selection of input and output variables, which is one of the key contributions of EDA to this methodology. The first model compares URT against other urban transport modes, applicable to route personalization, and the second scores the efficiency of URT lines. The main outcome of this paper is the proposed methodology, which has been experimentally validated using open data from the Transport for London (TfL) URT network and additional sources.
... One of the first proposals used the Efficiency Contribution Measure (ECM) of each variable (Pastor et al., 2002), via a hypothesis test determining whether an input is relevant or not. Other approaches used regression-based analysis, such as Ruggiero (2005), bootstrapping methodology (Simar and Wilson, 2000a), or enriched the DEA optimization programs using binary variables to model which inputs are selected (Peyrache et al., 2020;Benítez-Peña et al., 2020), as well as statistical methods (Araújo et al., 2014). Another family of approaches performs aggregations of the available variables, creating new variables which inform of the characteristics of the data, but losing interpretability. ...
... Es importante mencionar que tratamos de ser parsimoniosas a la hora de seleccionar el número de inputs y outputs a incluir para evitar el problema de dimensionalidad en nuestro modelo. Dado que trabajamos con un tamaño de muestra pequeño (alrededor de 50 centros educativos por región), decidimos incluir únicamente dos inputs y dos outputs en nuestro análisis (Peyrache et al., 2020). ...
Article
This paper provides evidence on the quality of the Spanish education system by measuring the efficiency and equity of each of the Autonomous Communities. To do so, we compare the productivity of each region by decomposing the gaps in terms of efficiency and technological change. Then, several dimensions of equity in terms of education are considered through multiple indicators used in the literature to measure the influence of students’ socio-economic status on their performance. Finally, we address the trade-off between educational efficiency and equity, and we discuss potential policies and interventions for improvement.
... In this regard, there are some "rules of thumb" in the literature that suggest the relation between the number of observations and the number of variables (Banker et al., 1989;Cook et al., 2014). As discussed by Peyrache et al. (2020), variable selection is still an unresolved issue in the DEA literature, as it is strongly dependent on the experience of the researchers and on the interpretation of the efficiency scores. In our study, we considered all these issues, and the choice of the variables was based on the traditional interpretation of production functions in agriculture, i.e., the output (revenue, production etc.,) is derived from land, labor, and capital. ...
Article
Full-text available
Background: Given the importance of the agricultural activity for the economic development of the state of Rio de Janeiro, Brazil, in this paper we assess the effect of the environmental parameters related to soils and socioeconomic factors on the performance of the municipalities. Objective: To identify factors that influence on agricultural production performance, as well as the directions of such influences (positive or negative). Methodology: A two-stage Data Envelopment Analysis (DEA) was chosen for this analysis. The performance scores are computed considering land, labor, and capital (or technology) as inputs, and the value of crops and of livestock production as outputs. Results: The average efficiency was 0.5509 and 12 municipalities out of the 89 assessed were 100% efficient. A high level of susceptibility to erosion significantly and negatively influences the efficiency scores. The suitability of land for agriculture and for livestock are positively associated with performance. The presence of family-based farmers favors the agricultural performance of the assessed municipalities. Implications: These results may support public policies related to land use and soil governance. Conclusions: The proposed two-stage DEA approach was useful to assess the influence of factors related to soils and socioeconomic indicators on the agricultural performance of the municipalities in the state of Rio de Janeiro. RESUMEN Antecedentes: Dada la importancia de la actividad agrícola para el desarrollo económico del estado de Río de Janeiro, Brasil, en este artículo evaluamos el efecto de los parámetros ambientales relacionados con los suelos y los factores socioeconómicos en el desempeño de los municipios. Objetivo: Identificar los factores influyentes en el desempeño de la producción agrícola, así como las direcciones de dichas influencias (positivas o negativas). Metodología: Análisis Envolvente de Datos (DEA) en dos etapas fue elegida para este análisis. Los puntajes de desempeño se calculan considerando la tierra, la mano de obra y el capital (o tecnología) como insumos, y el valor de los cultivos y de la producción ganadera como productos. Resultados: La eficiencia promedio fue de 0,5509 y 12 municipios de los 89 evaluados fueron 100% eficientes. Los altos niveles de susceptibilidad a la erosión influyen significativa y negativamente en los puntajes de desempeño. La aptitud de la tierra para la agricultura y la ganadería se asocian positivamente con el desempeño. La presencia de agricultores familiares favorece el desempeño agrícola de los municipios evaluados. Implicaciones: Estos resultados pueden apoyar políticas públicas relacionadas con el uso de la tierra y la gobernanza del suelo. Conclusiones: El enfoque DEA en dos etapas propuesto fue útil para †
... For different rules of thumb concerning the acceptable number of inputs and outputs, see, for example,Cooper et al. (2007) andPeyrache et al. (2020). ...
Article
Full-text available
Applications of data envelopment analysis (DEA) often include inputs and outputs that are embedded in some other inputs or outputs. For example, in a school assessment, the sets of students achieving good academic results or students with special needs are subsets of the set of all students. In a hospital application, the set of specific or successful treatments is a subset of all treatments. Similarly, in many applications, labour costs are a part of overall costs. Conventional variable and constant returns-to-scale DEA models cannot incorporate such information. Using such standard DEA models may potentially lead to a situation in which, in the resulting projection of an inefficient decision making unit, the value of an input or output representing the whole set is less than the value of an input or output representing its subset, which is physically impossible. In this paper, we demonstrate how the information about embedded inputs and outputs can be incorporated in the DEA models. We further identify common scenarios in which such information is redundant and makes no difference to the efficiency assessment and scenarios in which such information needs to be incorporated in order to keep the efficient projections consistent with the identified embeddings.
... We refer the reader to a more thorough discussion and comparison of these and other techniques in [36]. Other contributions propose methods which evaluate the importance of subsets of variables, such as those that enrich the optimization programs using binary variables to model the inclusion or exclusion of variables [37][38][39]. Along these lines, criteria such as Akaike's Information Criteria [40] or game-theoretic measures such as the Shapley value [41] have been used to choose among models. ...
Article
Full-text available
In this paper, we propose and compare new methodologies for ranking the importance 1 of variables in productive processes via an adaptation of OneClass Support Vector Machines. In 2 particular, we adapt two methodologies inspired by the machine learning literature: one involving 3 the random shuffling of values of a variable and another one using the objective value of the dual 4 formulation of the model. Additionally, we motivate the use of this type of algorithms in the 5 production context and compare their performance via a computational experiment. We observe that 6 the methodology based on shuffling the values of a variable outperforms the methodology based on 7 the dual formulation. We observe that the shuffling-based methodology correctly ranks the variables 8 in 94% of the scenarios with one relevant input and one irrelevant input. Moreover, it correctly ranks 9 each variable in at least 65% replications of a scenario with three relevant inputs and one irrelevant 10 input.
... In order to perform a DEA, it is vital to select inputs and outputs, which is a signifi cant step that is conducted before the DEA application (Peyrache et al., 2020). In the current research, the input factors are selected based on the DESI sub-dimensions, as this is a methodology that measures the progress of the digital economy and society. ...
... Keshavarz and Toloo (2020) developed a novel selecting DEA model to evaluate the sustainability status of electricity-generation technologies in the United Kingdom. Peyrache et al. (2020) introduced cardinality constraints directly into the DEA model with the VRS technology. M. Toloo et al. (2021) illustrated that selecting DEA models with different orientations (inputs and outputs) may lead to different results and then presented the integrated models to identify a set of common data for both orientations. ...
... Identification of inputs and output is an integral part of DEA and SFA. The selection of variables is done through two mechanisms (1) by solving mixed-integer linear programming (MILP) primarily employed when there is no heuristic decisionmaking or expert judgement, proposed by (Peyrache et al. 2020) and (2) previous research on efficiency (Mattson and Tidana 2019). Courts are Labor-Intensive, and all the previous research has incorporated at least one measure of labor but has not considered the input on capital (Mattson and Tidana 2019). ...
Article
Full-text available
Unlabelled: One of the four pillars of democracy in India is the judiciary, which in the recent past has experienced the 'cyclic syndrome' of arrears. There are 3.5 crore cases pending in the Indian judicial system that has a bearing on contract enforcement. A burgeoning stream of literature has reported the role of the judiciary in economic growth and development. In the wake of a given potential economic multiplier of the judicial system, examining the factors affecting the performance of the judiciary should merit attention. The present study juxtaposes jurisprudence and production theory, not frequently examined in the same gust by employing Data Envelopment Analysis (DEA), Malmquist Productivity Index (MPI), Stochastic Frontier Analysis (SFA), and regression for High Courts and Subordinate Courts. Employing the dataset for the years 2014-19, we investigate the technical efficiency and productivity of the High Courts and their Subordinate Courts and examine the factors influencing the dissolved cases. Furthermore, we examine the impact of COVID-19 on the cases instituted and cases disposed of. To sum up, the paper, thus, touches upon two basic dimensions of justice for High Courts and Subordinate Courts in India: Timeliness in the disposal of cases and the proportionate use of the state's resources. The study confirms the role of judges, judicial staff, and demand for justice on the supply of justice. Shreds of evidence point toward the need to introduce a "cocktail-based" approach instead of a "one-size-fits-all". Supplementary information: The online version contains supplementary material available at 10.1007/s43546-022-00377-1.
... The selection of correct inputs-outputs data for efficiency and productivity change is essential for accurate DEA estimation results [47]. Water efficiency cannot be measured with only single water input. ...
Article
Full-text available
This research evaluates the effects of the Three Red Lines policy on water usage efficiency (WUE), production technology heterogeneity, and water productivity change in 31 Chinese provinces between 2006 and 2020. SMB-DEA, Meta-frontier analysis, and Malmquist–Luenberger index (MLI) techniques were employed for estimation. Results revealed that the mean WUE (2006–2020) in all Chinese provinces was 0.52, with an improvement potential of 48%. Shanghai, Beijing, Shaanxi, and Tianjin were the best performers. The WUE scores before (2006–2011) and after (2012–2020) water policy implementation were 0.58 and 0.48, respectively; on average, there was more than a 9% decline in WUE after the implementation of the water policy. The eastern region has the most advanced water utilization technology as its technology gap ratio (TGR) is nearly 1. The average MLI (2006–2020) score was 1.13, suggesting that the MLI has increased by 12.57% over the study period. Further technology change (TC) is the key predictor of MLI growth, whereas efficiency change (EC) diminished from 2006 to 2020. The mean MLI score for 2006–2011 was 1.16, whereas the MLI Score for the period 2012–2020 was 1.10, indicating a modest decline following the implementation of the water policy. All three Chinese regions experienced MLI growth during 2006–2020, with TC the main change factor.
... As water loss gives rise to revenue decrease for the water utility, environmental impact, and social detriment when demand is unmet, this variable was associated with an opportunity cost that should be reduced, and consequently, it was included in the input set. The selection of input and output variables is an important step of model specification in DEA because the results of the efficiency analysis are conditioned by these choices [73]. Such choices are critical if the sample size is particularly small in comparison to the number of DMUs and when the DEA model is performed under the assumption of variable returns to scale [74]. ...
Article
Full-text available
Data relative to the water services industry in Italy indicate that there is a serious infrastructure gap between the southern regions and isles and the rest of the country. In these geographical areas, water utilities are provided with substantial public grants from the central and local governments to support investments necessary to mitigate the infrastructure divide by increasing capacity and improve service quality. This paper implements a meta-frontier non-parametric approach based on a data envelopment analysis (DEA) to evaluate the efficiencies of 71 Italian water utilities, accounting for the differentiated contexts in which they operate. A short-term perspective was assumed to estimate efficiency, considering the production factors associated with the infrastructure assets as non-discretionary inputs in the specification of the meta-frontier model. The results showed that water utilities operating in the southern regions and isles suffer from an efficiency gap in comparison to those in the northern and central regions. The average efficiency gap was 9.7%, achieving 24.9% in the worst case. Moreover, a more in-depth analysis focusing on the water utilities in the southern regions and isles indicated that scale inefficiencies might be an important determinant of such an efficiency gap. Indeed, slightly more than 69% of the water utilities operated at increasing returns to scale. Evidence from this study raises concern about the appropriate structure of the Italian water service industry and, particularly, the optimal size of the utilities and the financial sustainability of water services in the southern regions and isles.
... e results of the efficiency analysis are made by analyzing the data of the indicators mentioned above, and our results can be clearer when the sample size and indicators chosen are smaller, [13]. erefore, we need to select the suitable indicators for the different effects that we want to achieve when running the data, and we can leave a latitude in the choice of input and output indicators. ...
Article
Full-text available
Based on the panel data of star-rated hotels in China from 2015 to 2021, the following conclusions were drawn by using the DEA-Malmquist index method to investigate the efficiency of service innovation of star-rated hotels in China and to analyze the factors that influenced the change in the efficiency of star-rated hotels in China [1]. The service innovation efficiency of high-star hotels has a high sensitivity to changes in the external environment, while low-star hotels show a relatively resilient life under the unfavorable external circumstance [2]. Technological progress is an important reason for the promotion of service innovation efficiency, especially the key reason for the improvement of service innovation efficiency of high-star hotels [3]. The expansion of enterprise-scale has a more obvious inhibiting effect on the improvement of service innovation efficiency of low-star hotels [4]. After the epidemic, pure technical efficiency can obviously promote the service innovation efficiency.
... Additionally, Nataraja and Johnson [11] made a comprehensive comparison of some of the variable selection methods. Concepts developed by Peyrache et al. [12] and Lee and Cai [13] should also be mentioned due to their use of the modern approach which, respectively, involves the introduction of cardinality constraints directly into a DEA model or the use of the Least Absolute Shrinkage and Selection Operator (LASSO)-both aimed at reducing the number of variables. Finally, it seems important to emphasize at this point that, contrary to many other quantitative methods, in the case of DEA, a strong correlation between any two variables describing the DMUs does not justify a decision to remove one of them, because as Nunamaker [14] pointed out, removing a highly correlated variable may substantially alter DEA efficiency evaluations. ...
Article
Full-text available
Data envelopment analysis (DEA) is a popular and universal method for examining the efficiency with which decision-making units (DMUs) transform multiple inputs into multiple outputs. However, DEA has its limitations, one of them being its decreasing discriminatory power when the number of analyzed DMUs is insufficient or when there are too many variables (inputs/outputs) describing them. When resigning from any of the variables is impossible or undesired, or when the number of units cannot be increased, CI-DEA, a method proposed in this article, proves to be helpful. It consists of replacing the inputs and/or outputs of the studied DMUs with a smaller number of composite indicators. The aggregation of variables is not based on subjective decisions of the analyst, but depends solely on correlations that exist among variables. The construction of the CI-DEA model makes the interpretation of the results unambiguous and easy. The reliability of the results obtained with CI-DEA have been confirmed by extensive simulation studies performed under conditions of predetermined real-efficiency of DMUs. The usefulness of CI-DEA on real data has been demonstrated on the example of the efficiency assessment of the digitalization in the life of the Generation 50+ in 32 European countries.
... Furthermore, since no profound quantitative studies of existing patent partnership invention modes or the evolution direction of China's government have been conducted to date and the quantitative analysis still needs to be conducted. As a result, in order to fill this analysis void, this thesis investigates the following topics (Peyrache et al. 2020). Data envelopment analysis (DEA) is thought to be a more efficient performance calculation technique than traditional econometric methods such as regression or ratio analysis. ...
Article
Full-text available
This study measures the association between resources and the atmosphere; social and environmental aspects of energy production have become critical. In this context, the aim of this research is to explore the mediating effect of renewable energy patents in developing potential frameworks for energy policy viewpoints on the climate. The study took panel data from 2010 to 2017 and used a non-radial data envelopment analysis (DEA) process and panel data model for 30 Chinese provinces. The findings indicate that between 2010 and 2017, the average environmental efficiency index (EPI) of Chinese areas increased by 9.88%. When firms’ internal variables are proxied by their commodity (revenue), the relationship term’s point approximate coefficient is about 0.05. This magnitude means that a 1% rise in a company’s assets will result in a 5% increase is estimated to be about 0.157, implying that a 1% rise in firm leverage is correlated with a 15.7%. Finally, based on the study results, some policy implications were proposed.
... The research scopes are mainly the construction and operation stages of buildings, which account for more than 80% of the energy consumption of the building sector [47]. Relevant data are primarily Building activity capital stock (BCS-billion yuan), labor force (L-104 employees), and energy consumption (E-104 tons of coal equivalent) are selected as input indicators; the total output value of building activities (BTV-billion yuan) serves as the desirable output indicator; carbon emissions from building activities (CO 2 -104 tons) are used as undesirable output indicators [48]. Table 1 shows the summarized statistics of the five variables. ...
Article
Full-text available
The improvement of the energy and carbon emission efficiency of activities in the building sector is the key to China’s realization of the Paris Agreement. We can explore effective emission abatement approaches for the building sector by evaluating the carbon emissions and energy efficiency of construction activities, measuring the emission abatement potential of construction activities across the country and regions, and measuring the marginal abatement cost (MAC) of China and various regions. This study calculates the energy and carbon emissions performance of the building sector of 30 provinces and regions in China from 2005 to 2015, measures the dynamic changes in the energy-saving potential and carbon emission performance of the building sector, conducts relevant verification, and estimates the MAC of the building sector by using the slacks-based measure-directional distance function. The level of energy consumption per unit of the building sector of China has been decreasing yearly, but the energy structure has changed minimally (considering that clean energy is used). The total factor technical efficiency of the building sector of various provinces, cities, and regions is generally low, as verified in the evaluation of the energy-saving and emission abatement potential of the building sector of China. The energy saving and emission abatement of the building sector of China have great potential—that is, in approximately 50% of the total emissions of the building sector of China. In particular, Northeast and North China account for more than 50% of the total energy-saving and emission abatement potential. The study of the CO2 emissions and MAC of the building sector indicates that the larger the CO2 emissions are, the smaller MAC will be. The emission abatement efficiency is proportional to MAC. Based on this research, it can be more equitable and effective in formulating provincial emission reduction policy targets at the national level, and can maximize the contribution of the building sector of various provinces to the national carbon emission reduction.
Article
This study applies a multistep fuzzy stochastic procedure to evaluate Turkish health system efficiency by comparing crisp and stochastic efficiency estimates blending machine learning predictors. Conventional, bias-corrected, and fuzzy data envelopment analysis (DEA) estimates are employed and compared to explore province-based health systems’ efficiency scores. Fuzzy DEA α-level models are used to assess underlying uncertainty, yielding fuzzy results by changing 10 different alpha (α)-cut parameters from 0.10 to 1. Data are obtained from the official statistics of the Turkish Statistical Institute, and cross-province efficiency comparisons are performed through spatial analysis of the best and worst performers. A Pythagorean forest is constructed incorporating random forest regression to identify the most accurate predictors of province-based efficiency scores. The results reveal that bias correction and fuzziness outperform conventional efficiency analysis. High efficiency scores are observed when the α-cut parameter in the fuzzy DEA application is increased. High correlations are observed between efficiency scores elicited from crisp and stochastic DEA estimates (\({r}_{\rm s}>90\)). The spatial distribution of average fuzzy DEA scores (α = 1) for seven geographic regions are presented on a map of Turkey. Finally, considering the imprecision of the fuzzy DEA estimates, fuzzy DEA efficiency scores are used to identify the predictors of health system fuzzy efficiency scores. The Pythagorean forest demonstrates that the most important predictor of province-based fuzzy efficiency scores is the number of physicians. The average efficiency values obtained from the conventional DEA model are outstanding in comparison to bias-corrected and fuzzy DEA estimates. Future studies could compare crisp and fuzzy efficiency estimates using large spatial datasets.
Article
Suitable spatial morphology of cultivated land is a basic requirement for sustaining agricultural economic development in mountainous areas. Coordinated development efficiency of cultivated land spatial morphology and agricultural economy (CECA) is of great practical significance to measure the efficiency of cultivated land use, and thereby promote regional rural revitalization. However, few studies to date have focused on coordinated development efficiency between cultivated land use and agricultural economy in mountainous areas from the perspective of cultivated land spatial morphology. Thus, the present study explores CECA with this focus using the data envelopment analysis method, and analyzes the key influencing factors via a geographical detector model in 16 counties in western Hubei province. The results show the following: (1) CECA exhibits significant spatial heterogeneity that is high in the south of the study area and low in the north; (2) scale efficiency is the primary limiting factor for CECA; (3) the insufficient output of cultivated land use mainly restricts CECA in the south of the study area, while individual county in the north suffered from input redundancy and insufficient output; and (4) population density in the southern region has the most significant effect on CECA, and gross domestic product has the greatest impact in the northern region. The results contribute to the derivation of specific measures by which to promote cultivated land use efficiency and sustainable development of the social economy.
Article
Full-text available
Veri Zarflama Analizi; etkinlik, verimlilik veya performans terimlerinin hızla önem kazandığı her alanda akla gelen ilk yöntemlerden biridir. Bu makalede, tarım sektöründe Veri Zarflama Analizi (VZA) kullanılarak 2019 yılına ait etkinlik ölçümü yapılmıştır. Çalışmada Türkiye’de bitkisel üretim hasılası yüksek olan 20 il ele alınmıştır. Girdi olarak ‘İşlenen Tarım Alanı’, ‘Tarımsal Mekanizasyon’, ‘Tarımsal Sulamada Kullanılan Enerji’ ve ‘Gübre Tüketimi’ verileri kullanılmıştır. Çıktı olarak ise ‘Sebze Meyve Üretim’ ve ‘Tahıl ve Diğer Bitkisel Üretim’ verileri kullanılmıştır. Sonuca göre; CCR Model çözümünde 11 tane il ‘etkin’, 9 tane il ‘etkinsiz’, BCC Model çözümünde ise 10 tane il ‘etkin’, 10 tane il ise ‘etkinsiz’ bulunmuştur. Çalışmanın sonuç bölümünde illerin etkinlikleri karşılaştırılarak yorumlanmış ve performans iyileştirmeleri yapılmıştır. Data Envelopment Analysis; is one of methods that come to mind first in every field where the terms efficiency, productivity or performance gain importance. In this paper, relative efficiency is measured by using Data Envelopment Analysis (DAE) for the year of 2019 in the field of herbal agriculture, In this study, Turkey’s 20 different cities were selected according to their agricultural revenues. The data of 'Processed Agricultural Area', 'Agricultural Mechanization', 'Energy Used in Agricultural Irrigation' and 'Fertilizer Consumption' were used as input while Vegetable Fruit Production' and 'Grain and Other Crop Production' were used as output. According to the results; in the CCR Model solution, 11 provinces were “effective”, 9 provinces were “ineffective”, whereas in the BCC Model solution, 10 provinces were “effective” and 10 provinces were found “ineffective”. In the conclusion part of the study, the efficiencies of the provinces were compared, interpreted, then accordingly performance improvements were made.
Preprint
Full-text available
Data Envelopment Analysis (DEA) allows us to capture the complex relationship between multiple inputs and outputs in firms and organizations. Unfortunately, managers may find it hard to understand a DEA model and this may lead to mistrust in the analyses and to difficulties in deriving actionable information from the model. In this paper, we propose to use the ideas of target setting in DEA and of counterfactual analysis in Machine Learning to overcome these problems. We define DEA counterfactuals or targets as alternative combinations of inputs and outputs that are close to the original inputs and outputs of the firm and lead to desired improvements in its performance. We formulate the problem of finding counterfactuals as a bilevel optimization model. For a rich class of cost functions, reflecting the effort an inefficient firm will need to spend to change to its counterfactual, finding counterfactual explanations boils down to solving Mixed Integer Convex Quadratic Problems with linear constraints. We illustrate our approach using both a small numerical example and a real-world dataset on banking branches.
Article
Full-text available
Coronavirus outbreak has been highly disruptive for aviation sector. There is strong correlation between COVID-19 related news, volatility in transportation, low confidence in travel safety, and uncertainty in this era. In this research, we study and distinguish the COVID-19's impact on U.S. airlines' performance. The network and low-cost carriers responded differently to it in terms of capacity reduction, market share reduction, scheduled flights reduction, flight cancellations, and service quality in the year 2020. We illustrate low-cost carrier had higher efficiency compared to network ones during pandemic by applying Network Data Envelopment Analysis. Furthermore, the effects of two key factors that emerge from COVID-19, the government's stringency actions and passengers' panic, on U.S. airlines efficiency are studied. Our analysis demonstrate that the negative effect is more significant for passengers' panic than it is for governments' stringency measures. In addition, we show that passengers' panic has more impact on the efficiency of network carriers compared to low-cost carrier.
Article
Full-text available
The appropriate selection of inputs and outputs also their count is a crucial step to achieve relevant results for any study involving the measurement of a set of decision-making units' overall efficiency using the data envelopment analysis methodology. In the literature, however, there is still no definitive standard to guide this selection. This article offers a novel two-phase procedure allowing, by solving an integer non-linear program in the first phase, to determine the most suitable number of both inputs and outputs to be used. And then, by following a developed plithogenic multi-criteria decision-making method in the second phase, to specify inputs and outputs that have the most contribution in improving efficiency and should be considered in the DEA model. A running example is applied throughout this article to make the proposed procedure more comprehensible. It consists in selecting variables for measuring the overall efficiency of the Northern Border University in Saudi Arabia. Obtained results consider at the same time the uncertainty aspect of data, also indeterminacy in experts' judgments towards inputs and outputs.
Article
This paper evaluates the efficiency frontier of 34 mobile operators from OECD countries and compares the performance of multinational companies and domestic companies between 2014 and 2018. Unlike most previous studies, the present paper relies on non-financial data for both input and output variables. It uses slack-based Data Envelopment Analysis (SBM DEA) to obtain the efficiency scores. The efficiency scores were compared for the statistically significant differences using Mann-Whitney U test. Our findings showed domestic companies to be more efficient than multinational ones regardless of their scale. The suggested explanation is that domestic enterprises are better at managing their resources and more familiar with their local market. The results also indicate that the primary source of inefficiency was inadequate utilisation of the available spectrum range. These empirical findings provide extra insight to the managers in the industry on possible steps to reduce the efficiency from the perspective of non-financial data.
Article
Data with large dimensions will bring various problems to the application of data envelopment analysis (DEA). In this study, we focus on a “big data” problem related to the considerably large dimensions of the input-output data. The four most widely used approaches to guide dimension reduction in DEA are compared via Monte Carlo simulation, including principal component analysis (PCA-DEA), which is based on the idea of aggregating input and output, efficiency contribution measurement (ECM), average efficiency measure (AEC), and regression-based detection (RB), which is based on the idea of variable selection. We compare the performance of these methods under different scenarios and a brand-new comparison benchmark for the simulation test. In addition, we discuss the effect of initial variable selection in RB for the first time. Based on the results, we offer guidelines that are more reliable on how to choose an appropriate method.
Article
One of the main challenges when applying data envelopment analysis (DEA) is the selection of appropriate input and output variables. This paper addresses this important problem using a novel two-stage method. In the first stage, we use entropy theory to generate a comprehensive efficiency score (CES) of each decision-making unit. In the second stage, we select input and output variables using the Bayesian information criterion, when CES is treated as a dependent variable and the input and output variables are used as explanatory variables. We use stochastic data to demonstrate that our proposed method can improve the discrimination power of DEA and determine the important input and output variables. Finally, we compare the proposed method with principal component analysis using datasets on carbon emissions in China. This comparison demonstrates the practical value of our proposed method.
Article
With the increasing environmental pollution and environmental awareness, ecological efficiency has become a hot issue. Based on the traditional DEA model of unexpected output, a new super data envelopment analysis and slacks-based measure considering undesirable outputs (S-DEA-SBM-UO) efficiency model is proposed. Moreover, the weight S-DEA-SBM-UO (W–S-DEA-SBM-UO) is applied to evaluate the ecological efficiency of industrial enterprises. The experimental results show that the W–S-DEA-SBM-UO can measure the ecological efficiency including unexpected output more effectively than the conventional DEA-SBM to solve unexpected output. Finally, it is suggested that enterprises should make great efforts to improve the ecological efficiency and it is pointed out that the results have strong theoretical significance and research value for ecological improvement.
Article
Full-text available
The efficiency of banks has a critical role in development of sound financial systems of countries. Data Envelopment Analysis (DEA) has witnessed an increase in popularity for modeling the performance efficiency of banks. Such efficiency depends on the appropriate selection of input and output variables. In literature, no agreement exists on the selection of relevant variables. The disagreement has been an on-going debate among academic experts, and no diagnostic tools exist to identify variable misspecifications. A cognitive analytics management framework is proposed using three processes to address misspecifications. The cognitive process conducts an extensive review to identify the most common set of variables. The analytics process integrates a random forest method; a simulation method with a DEA measurement feedback; and Shannon Entropy to select the best DEA model and its relevant variables. Finally, a management process discusses the managerial insights to manage performance and impacts. A sample of data is collected on 303 top-world banks for the periods 2013 to 2015 from 49 countries. The experimental simulation results identified the best DEA model along with its associated variables, and addressed the misclassification of the total deposits. The paper concludes with the limitations and future research directions.
Article
Hospital efficiency and equity in health care delivery are two enduring research topics. Yet little research has been done to examine the relationship between them. This paper studies the impact of hospital efficiency on equity in health care delivery based on a proprietary dataset of hospital characteristics and 630,000 inpatient records from 149 public hospitals in a representative Chinese city. To measure the hospitals' efficiencies, this study takes the hospitals' operational features and case-mix indexes into account, and computes the efficiency levels using data envelopment analysis with bootstrapping. Through regressions that control for a variety of the patients’ personal characteristics (e.g., age, disease, residence, hospital visit frequency), this study shows that the gap between hospitalization expenses of urban and rural inpatients in more efficient hospitals is smaller than those in less efficient hospitals. Thus efficiency enhances equity in expenditure between urban and rural patients. But the dwindling urban-rural gap in expenditure is achieved by raising the spending of rural patients, thereby undermining their access to health care. This pattern is more conspicuous in large and sophisticated high-tier hospitals. Further analysis shows that hospital efficiency impacts equity of health care delivery by inducing different lengths of stay and uncovered parts of total expenditure for urban and rural groups. The findings imply that an efficiency-oriented health care policy may lead to social benefit loss.
Article
Full-text available
This paper proposes an integrative approach to feature (input and output) selection in Data Envelopment Analysis (DEA). The DEA model is enriched with zero-one decision variables modelling the selection of features, yielding a Mixed Integer Linear Programming formulation. This single-model approach can handle different objective functions as well as constraints to incorporate desirable properties from the real-world application. Our approach is illustrated on the benchmarking of electricity Distribution System Operators (DSOs). The numerical results highlight the advantages of our single-model approach provide to the user, in terms of making the choice of the number of features, as well as modeling their costs and their nature.
Article
Full-text available
This study surveys the data envelopment analysis (DEA) literature by applying a citation-based approach. The main goals are to find a set of papers playing the central role in DEA development and to discover the latest active DEA subareas. A directional network is constructed based on citation relationships among academic papers. After assigning an importance index to each link in the citation network, main DEA development paths emerge. We examine various types of main paths, including local main path, global main path, and multiple main paths. The analysis result suggests, as expected, that Charnes et al. (1978) [Charnes A, Cooper WW, Rhodes E. Measuring the efficiency of decision making units. European Journal of Operational Research 1978; 2(6): 429–444] is the most influential DEA paper. The five most active DEA subareas in recent years are identified; among them the “two-stage contextual factor evaluation framework” is relatively more active. Aside from the main path analysis, we summarize basic statistics on DEA journals and researchers. A growth curve analysis hints that the DEA literature’s size will eventually grow to at least double the size of the existing literature.
Article
Full-text available
Despite their increasing use, composite indicators remain controversial. The undesirable dependence of countries’ rankings on the preliminary normalization stage, and the disagreement among experts/stakeholders on the specific weighting scheme used to aggregate sub-indicators, are often invoked to undermine the credibility of composite indicators. Data envelopment analysis may be instrumental in overcoming these limitations. One part of its appeal in the composite indicator context stems from its invariance to measurement units, which entails that a normalization stage can be skipped. Secondly, it fills the informational gap in the ‘right’ set of weights by generating flexible ‘benefit of the doubt’-weights for each evaluated country. The ease of interpretation is a third advantage of the specific model that is the main focus of this paper. In sum, the method may help to neutralize some recurring sources of criticism on composite indicators, allowing one to shift the focus to other, and perhaps more essential stages of their construction.
Article
Full-text available
Since the original Data Envelopment Analysis (DEA) study by Charnes et al. [Measuring the efficiency of decision-making units. European Journal of Operational Research 1978;2(6):429–44], there has been rapid and continuous growth in the field. As a result, a considerable amount of published research has appeared, with a significant portion focused on DEA applications of efficiency and productivity in both public and private sector activities. While several bibliographic collections have been reported, a comprehensive listing and analysis of DEA research covering its first 30 years of history is not available.This paper thus presents an extensive, if not nearly complete, listing of DEA research covering theoretical developments as well as “real-world” applications from inception to the year 2007.A listing of the most utilized/relevant journals, a keyword analysis, and selected statistics are presented.
Article
Full-text available
Model misspecification has significant impacts on data envelopment analysis (DEA) efficiency estimates. This paper discusses the four most widely-used approaches to guide variable specification in DEA. We analyze efficiency contribution measure (ECM), principal component analysis (PCA-DEA), a regression-based test, and bootstrapping for variable selection via Monte Carlo simulations to determine each approach’s advantages and disadvantages. For a three input, one output production process, we find that: PCA-DEA performs well with highly correlated inputs (greater than 0.8) and even for small data sets (less than 300 observations); both the regression and ECM approaches perform well under low correlation (less than 0.2) and relatively larger data sets (at least 300 observations); and bootstrapping performs relatively poorly. Bootstrapping requires hours of computational time whereas the three other methods require minutes. Based on the results, we offer guidelines for effectively choosing among the four selection methods.
Article
Full-text available
Nonparametric data envelopment analysis (DEA) estimators have been widely applied in analysis of productive efficiency. Typically they are defined in terms of convex-hulls of the observed combinations of $\mathrm{inputs}\times\mathrm{outputs}$ in a sample of enterprises. The shape of the convex-hull relies on a hypothesis on the shape of the technology, defined as the boundary of the set of technically attainable points in the $\mathrm{inputs}\times\mathrm{outputs}$ space. So far, only the statistical properties of the smallest convex polyhedron enveloping the data points has been considered which corresponds to a situation where the technology presents variable returns-to-scale (VRS). This paper analyzes the case where the most common constant returns-to-scale (CRS) hypothesis is assumed. Here the DEA is defined as the smallest conical-hull with vertex at the origin enveloping the cloud of observed points. In this paper we determine the asymptotic properties of this estimator, showing that the rate of convergence is better than for the VRS estimator. We derive also its asymptotic sampling distribution with a practical way to simulate it. This allows to define a bias-corrected estimator and to build confidence intervals for the frontier. We compare in a simulated example the bias-corrected estimator with the original conical-hull estimator and show its superiority in terms of median squared error. Comment: Published in at http://dx.doi.org/10.1214/09-AOS746 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Article
Full-text available
This paper discusses statistical procedures for testing various restrictions in the context of nonparametric models of technical efficiency. In particular, tests for whether inputs or outputs are irrelevant, as well as tests of whether inputs or outputs may be aggregated are formulated. Bootstrap estimation procedures which yield appropriate critical values for the test statistics are also provided. Evidence on the true sizes and power of the proposed tests is obtained from Monte Carlo experiments.
Article
The curse of dimensionality problem arises when a limited number of observations are used to estimate a high-dimensional frontier, in particular, by data envelopment analysis (DEA). The study conducts a data generating process (DGP) to argue the typical “rule of thumb” used in DEA, e.g. the required number of observations should be at least larger than twice of the number of inputs and outputs, is ambiguous and will produce large deviations in estimating the technical efficiency. To address this issue, we propose a Least Absolute Shrinkage and Selection Operator (LASSO) variable selection technique, which is usually used in data science for extracting significant factors, and combine it in a sign-constrained convex nonparametric least squares (SCNLS), which can be regarded as DEA estimator. Simulation results demonstrate that the proposed LASSO-SCNLS method and its variants provide useful guidelines for the DEA with small datasets.
Article
Data Envelopment Analysis (DEA) has recently emerged as an effective method for the sustainability assessment of industrial systems. Unfortunately, sustainability studies require the evaluation of a wide range of indicators (i.e. inputs and outputs in DEA notation), which can weaken the discriminatory power of DEA and ultimately lead to results that are less meaningful and hard to interpret. Here we develop a systematic MILP-DEA approach that identifies redundant metrics that can be omitted in DEA models with minimum information loss. Our approach is based on a bi-level programming model where binary variables denote the selection of the metrics and the objective functions are constraints are formulated according to the DEA models. The capabilities of this method are illustrated through the assessment of several industrial systems evaluated according to multiple criteria, some of which are based on life cycle metrics. Our results show that our systematic approach can effectively reduce the number of variables in DEA studies. This method can also be used to enhance the discriminatory power of DEA by diminishing the number of units deemed efficient considering a maximum allowable error.
Article
It is well-known that the convergence rates of nonparametric efficiency estimators (e.g., free-disposal hull and data envelopment analysis estimators) become slower with increasing numbers of input and output quantities (i.e., dimensionality). Dimension reduction is often utilized in non-parametric density and regression where similar problems occur, but has been used in only a few instances in the context of efficiency estimation. This paper explains why the problem occurs in nonparametric models of production and proposes three diagnostics for when dimension reduction might lead to more accurate estimation of efficiency. Simulation results provide additional insight, and suggest that in many cases dimension reduction is advantageous in terms of reducing estimation error. The simulation results also suggest that when dimensionality is reduced, free-disposal hull estimators become an attractive, viable alternative to the more frequently used (and more restrictive) data envelopment analysis estimators. In the context of efficiency estimation, these results provide the first quantification of the tradeoff between information lost versus improvement in estimation error due to dimension reduction. Results from several papers in the literature are revisited to show what might be gained from reducing dimensionality and how interpretations might differ.
Article
An important problem in the public sector, given the lack of output prices and exit decisions to sanction inefficient units, is finding the optimal industry structure. We apply a novel approach to Italian courts of justice, a typical example of a small sector in the public domain but with important effects on economic agents' behavior, firm size, FDI, and on the overall economy. The suggested approach enables us to break down the aggregate court inefficiency into different sources and to investigate the optimal structure of the justice sector. Results show that technical inefficiency (lack of best practice) accounts for more than one third (38%) of total inefficiency, while size inefficiency (courts that are too big) is about 22-25%. The remaining inefficiency is represented by a sub-optimal allocation of inputs (30-40%). If reallocation is confined to macroregional or regional borders, then technical and size inefficiencies increase in relative terms compared to reallocation inefficiency. We suggest that, together with reallocating inputs by merging smaller courts, a complementary set of policy interventions would be to adopt best practices and split larger courts.
Article
We use various recent scale ranking methods in the DEA (Data Envelopment Analysis) context. Two methods are based on multivariate statistical analysis: canonical correlation analysis (CCA) and discriminant analysis of ratios (DR/DEA), while the third is based on the cross efficiency matrix (CE/DEA) derived from the DEA. This multirank approach is necessary for rank validation of the model. Their consistency and goodness of fit with the DEA are tested by various nonparametric statistical tests. Once we had validated the consistency among the ranking methods, we constructed a new overall rank combining all of them. Actually, given the DEA results, we here provide ranks that complement the DEA for a full ranking scale beyond the mere classification to two dichotomic groups. This new combined ranking method does not replace the DEA, but it adds a post-optimality analysis to the DEA results. In this paper, we combine the ranking approach with stochastic DEA: each approach is in the forefront of DEA. This is an attempt to bridge between the DEA frontier Pareto Optimum approach and the average approach used in econometrics. Furthermore, the quality of this bridge is tested statistically and thus depends on the data. We demonstrate this method for fully ranking the Industrial Branches in Israel. In order to delete unmeaningful input and output variables, and to increase the fitness between the DEA and the ranking, we utilize the canonical correlation analysis to select the meaningful variables. Furthermore, we run the ranking methods on two sets of variables to select the proper combination of variables which best represents labor.
Article
In Data Envelopment Analysis (DEA), when there are more inputs and outputs, there are more efficient Decision Making Units (DMus). For example, if the specific inputs or outputs advantageous for a particular DMU are used, the DMU will become efficient. Usually the variables used as inputs or outputs are correlated. Therefore, the inputs and outputs should be selected appropriately by experts who know their characteristics very well. People who are less familiar with those characteristics require tools to assist in the selection. We propose using principal component analysis as a means of weighting inputs and/or outputs and summarizing parsimoniously them rather than selecting them. A basic model and its modification are proposed. In principal component analysis, many weights for the variables that define principal components (PCs) have negative values. This may cause a negative integrated input that is a denominator of the objective function in fractional programming. The denominator should be positive. In the basic model, a condition that the denominator must be positive is added. When the number of PCs is less than the number of original variables, a part of original information is neglected. In the modified model, a part of the neglected information is also used.
Article
We evaluate, by means of mathematical programming formulations, the relative technical and scale efficiencies of decision making units (DMUs) when some of the inputs or outputs are exogenously fixed and beyond the discretionary control of DMU managers. This approach further develops the work on efficiency evaluation and on estimation of efficient production frontiers known as data envelopment analysis (DEA). We also employ the model to provide efficient input and output targets for DMU managers in a way that specifically accounts for the fixed nature of some of the inputs or outputs. We illustrate the approach, using real data, for a network of fast food restaurants.
Article
This paper provides a review of the evolution, development and future research directions on the use of weights restrictions and value judgements in Data Envelopment Analysis. The paper argues that the incorporation of value judgements in DEA was motivated by applications of the method in real life organisations. The application driven development of the methods has led to a number of different approaches in the literature which have inevitably different uses and interpretations. The paper concentrates on the implications of weights restrictions on the efficiency, targets and peer comparators of inefficient Decision Making Units. The paper concludes with future research directions in the area of value judgements and weights restrictions.
Article
In management contexts, mathematical programming is usually used to evaluate a collection of possible alternative courses of action en route to selecting one which is best. In this capacity, mathematical programming serves as a planning aid to management. Data Envelopment Analysis reverses this role and employs mathematical programming to obtain ex post facto evaluations of the relative efficiency of management accomplishments, however they may have been planned or executed. Mathematical programming is thereby extended for use as a tool for control and evaluation of past accomplishments as well as a tool to aid in planning future activities. The CCR ratio form introduced by Charnes, Cooper and Rhodes, as part of their Data Envelopment Analysis approach, comprehends both technical and scale inefficiencies via the optimal value of the ratio form, as obtained directly from the data without requiring a priori specification of weights and/or explicit delineation of assumed functional forms of relations between inputs and outputs. A separation into technical and scale efficiencies is accomplished by the methods developed in this paper without altering the latter conditions for use of DEA directly on observational data. Technical inefficiencies are identified with failures to achieve best possible output levels and/or usage of excessive amounts of inputs. Methods for identifying and correcting the magnitudes of these inefficiencies, as supplied in prior work, are illustrated. In the present paper, a new separate variable is introduced which makes it possible to determine whether operations were conducted in regions of increasing, constant or decreasing returns to scale (in multiple input and multiple output situations). The results are discussed and related not only to classical (single output) economics but also to more modern versions of economics which are identified with "contestable market theories."
Article
A model for measuring the efficiency of Decision Making Units (=DMU's) is presented, along with related methods of implementation and interpretation. The term DMU is intended to emphasize an orientation toward managed entities in the public and/or not-for-profit sectors. The proposed approach is applicable to the multiple outputs and designated inputs which are common for such DMU's. A priori weights, or imputations of a market-price-value character are not required. A mathematical programming model applied to observational data provides a new way of obtaining empirical estimates of extrernal relations—such as the production functions and/or efficient production possibility surfaces that are a cornerstone of modern economics. The resulting extremal relations are used to envelop the observations in order to obtain the efficiency measures that form a focus of the present paper. An illustrative application utilizes data from Program Follow Through (=PFT). A large scale social experiment in public school education, it was designed to test the advantages of PFT relative to designated NFT (=Non-Follow Through) counterparts in various parts of the U.S. It is possible that the resulting observations are contaminated with inefficiencies due to the way DMU's were managed en route to assessing whether PFT (as a program) is superior to its NFT alternative. A further mathematical programming development is therefore undertaken to distinguish between “management efficiency” and “program efficiency.” This is done via procedures referred to as Data Envelopment Analysis (=DEA) in which one first obtains boundaries or envelopes from the data for PFT and NFT, respectively. These boundaries provide a basis for estimating the relative efficiency of the DMU's operating under these programs. These DMU's are then adjusted up to their program boundaries, after which a new inter-program envelope is obtained for evaluating the PFT and NFT programs with the estimated managerial inefficiencies eliminated. The claimed superiority of PFT fails to be validated in this illustrative application. Our DEA approach, however, suggests the additional possibility of new approaches obtained from PFT-NFT combinations which may be superior to either of them alone. Validating such possibilities cannot be done only by statistical or other modelings. It requires recourse to field studies, including audits (e.g., of a U.S. General Accounting Office variety) and therefore ways in which the results of a DEA approach may be used to guide such further studies (or audits) are also indicated.
Article
A substantial body of recent work has opened the way to exploring the statistical properties of DEA estimators of production frontiers and related efficiency measures. The purpose of this paper is to survey several possibilities that have been pursued, and to present them in a unified framework. These include the development of statistics to test hypotheses about the characteristics of the production frontier, such as returns to scale, input substitutability, and model specification, and also about variation in efficiencies relative to the production frontier.
Article
US experience shows that deregulation of the airline industry leads to the formation of hub-and-spoke (HS) airline networks. Viewing potential HS networks as decision-making units, we use data envelopment analysis (DEA) to select the most efficient networks configurations from the many that are possible in the deregulated European Union airline market. To overcome the difficulties that DEA encounters when there is an excessive number of inputs or outputs, we employ principal component analysis (PCA) to aggregate certain, clustered data, whilst ensuring very similar results to those achieved under the original DEA model. The DEA–PCA formulation is then illustrated with real-world data gathered from the West European air transportation industry.
Article
A nonlinear (nonconvex) programming model provides a new definition of efficiency for use in evaluating activities of not-for-profit entities participating in public programs. A scalar measure of the efficiency of each participating unit is thereby provided, along with methods for objectively determining weights by reference to the observational data for the multiple outputs and multiple inputs that characterize such programs. Equivalences are established to ordinary linear programming models for effecting computations. The duals to these linear programming models provide a new way for estimating extremal relations from observational data. Connections between engineering and economic approaches to efficiency are delineated along with new interpretations and ways of using them in evaluating and controlling managerial behavior in public programs.
Article
Data Envelopment Analysis (DEA) has become an accepted approach for assessing efficiency in a wide range of cases. The present paper suggests a systematic application procedure of the DEA methodology in its various stages. Attention is focused on the selection of ‘decision making units’ (DMUs) to enter the analysis as well as the choice and screening of factors. The application of several DEA models (in different versions and formulations) is demonstrated, in the process of determining relative efficiencies within the compared DMUs.
Article
This paper provides a sketch of some of the major research thrusts in data envelopment analysis (DEA) over the three decades since the appearance of the seminal work of Charnes et al. (1978) [Charnes, A., Cooper, W.W., Rhodes, E.L., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2, 429–444]. The focus herein is primarily on methodological developments, and in no manner does the paper address the many excellent applications that have appeared during that period. Specifically, attention is primarily paid to (1) the various models for measuring efficiency, (2) approaches to incorporating restrictions on multipliers, (3) considerations regarding the status of variables, and (4) modeling of data variation.
Article
Some problems in economics, operations research, and engineering may be approached by means of a pair of radial DEA models that are nested, i.e., that the set of constraints of one of them is included in that of the other. In this paper we have focused on analyzing the marginal role of a given variable, calledcandidate, with respect to the efficiency measured by means of a DEA model. First, we have defined a newefficiency contribution measure (ECM), which finally compares the efficiency scores of the two radial DEA models differing in the candidate. This can be either one input or one output. Then,based on ECM, we have also approached the problem from a statistical point of view. To be precise, we have developed a statistical test that allows us to evaluate the significance of the observed efficiency contribution of the candidate. Eventually, solving this test may provide some useful insights in order to decide the incorporation or the deletion of a variable into/from a given DEA model, on the basis of the information supplied by the data. Two procedures for progressive selection of variables were designed by sequentially applying the test: a forward selection and a backward elimination. These can be very helpful in the initial selection of variables when building a radial DEA model.
Article
Within the data envelopment analysis context, problems of discrimination between efficient and inefficient decision-making units often arise, particularly if there are a relatively large number of variables with respect to observations. This paper applies Monte Carlo simulation to generalize and compare two discrimination improving methods; principal component analysis applied to data envelopment analysis (PCA-DEA) and variable reduction based on partial covariance (VR). Performance criteria are based on the percentage of observations incorrectly classified; efficient decision-making units mistakenly defined as inefficient and inefficient units defined as efficient. A trade-off was observed with both methods improving discrimination by reducing the probability of the latter error at the expense of a small increase in the probability of the former error. A comparison of the methodologies demonstrates that PCA-DEA provides a more powerful tool than VR with consistently more accurate results. PCA-DEA is applied to all basic DEA models and guidelines for its application are presented in order to minimize misclassification and prove particularly useful when analyzing relatively small datasets, removing the need for additional preference information.
Article
Data envelopment analysis (DEA) is a data oriented, non-parametric method to evaluate relative efficiency based on pre-selected inputs and outputs. In some cases, the performance model is not well defined, so it is critical to select the appropriate inputs and outputs by other means. When we have many potential variables for evaluation, it is difficult to select inputs and outputs from a large number of possible combinations. We propose an input output selection method that uses diagonal layout experiments, which is a statistical approach to find an optimal combination. We demonstrate the proposed method using financial statement data from NIKKEI 500 index.
Article
This paper introduces a decomposition of the Malmquist productivity index into component indexes. The motivation is to derive an analogue of the decomposition of the Törnqvist index into productivity and quality change provided by Fixler and Zieschang (1992) to the Malmquist index. Since we employ no second order approximations, this decomposition requires additional structure, namely a generalized version of Shephard's (1970) inverse homotheticity, which we dub subvector homotheticity. We show that subvector homotheticity is necessary and sufficient for our decomposition.
Article
Efficiency scores of production units are measured by their distance to an estimated production frontier. Nonparametric data envelopment analysis estimators are based on a finite sample of observed production units, and radial distances are considered. We investigate the consistency and the speed of convergence of these estimated efficiency scores (or of the radial distances) in the very general setup of a multi-output and multi-input case. It is shown that the speed of convergence relies on the smoothness of the unknown frontier and on the number of inputs and outputs. Furthermore, one has to distinguish between the output- and the input-oriented cases.
Article
Data Envelopment Analysis (DEA) is a recently developed methodology that is widely used for estimating relative efficiency scores of Decision Making Units (DMUs) that use several inputs to produce several outputs. Model specification in DEA includes aspects such as the choice of inputs and outputs or the adoption of a returns to scale assumption. As pointed out by many authors, it is obvious that the specification of a model is the key to having reliable efficiency scores. In this paper, we are particularly concerned with the selection of variables in DEA models. To be specific, we investigate the performance of several statistical tests existing in the literature that can be used for the selection of variables. In particular, the behaviour of the well-known tests proposed by Banker2 and the nonparametric tests recently developed by Pastor et al.13 is analyzed in relation to several factors such as sample size, model size, the specification of returns to scale and the type and level of inefficiency. We have drawn some conclusions that will be of help for practical uses, since the observed behaviour of the tests in the different scenarios determined by the specifications of the mentioned factors may provide some useful insight into the choice of an adequate statistical test in the particular context of a given DEA application.
Article
One of the most important steps in the application of modeling using data envelopment analysis (DEA) is the choice of input and output variables. In this paper, we develop a formal procedure for a “stepwise” approach to variable selection that involves sequentially maximizing (or minimizing) the average change in the efficiencies as variables are added or dropped from the analysis. After developing the stepwise procedure, applications from classic DEA studies are presented and the new managerial insights gained from the stepwise procedure are discussed. We discuss how this easy to understand and intuitively sound method yields useful managerial results and assists in identifying DEA models that include variables with the largest impact on the DEA results.
Article
Los beneficios de los programas de promoci�n forestal desarrollados en Argentina a partir de 1990, fueron desaprovechados en la provincia de Santiago del Estero, a pesar de disponer de amplias superficies con aptitud forestal en zonas de secano y de regad�o, debido a una insuficiencia de informaci�n y, consecuentemente, a la escasez de respuesta de los productores. Los objetivos de este trabajo son: a) analizar las repercusiones socioecon�micas que gener� dicha pol�tica. b) evaluar la respuesta de los productores a tales incentivos. Para ello, se tomaron datos de campo procedentes de una encuesta estructurada, aplicada a una muestra de 152 explotaciones agrarias pertenecientes al �rea de riego del R�o Dulce. La informaci�n recogida se resumi� identificando las explotaciones tipo o representativas de la zona, y se dise�aron y resolvieron modelos matem�ticos de optimizaci�n econ�mica en cada una de ellas. Los resultados obtenidos parecen indicar que la pol�tica forestal de incentivos en Santiago del Estero debe ser replanteada. The benefits of forest incentives developed in Argentina throughout the 1990s, have not been taken advantage in the province of Santiago del Estero, despite its large surface for forestry uses in both irrigable and dry areas, due to a lack of information and, therefore, a low-level of response of the producers. This paper pursues two aims: a) analyze the social and economic consequences of this politics. b) evaluate the response of producers in Santiago del Estero to those incentives. To do that, field data from a structured poll, applied to a sample of 152 agrarian exploitations belonging to the irrigation area of R�o Dulce, were collected. This information was summarized identifying the representative exploitations in the area, and mathematical models of economic optimization were designed and solved for each one. The results obtained suggest that the forest politics of incentives in Santiago del Estero, should be reconsidered
Impact assessment of input omission on DEA
  • Ruggiero