ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

European Journal of Operational Research 0 0 0 (2016) 1–16

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

Innovative Applications of O.R.

Value added, educational accountability approaches and their effects

on schools’ rankings: Evidence from Chile

✩

Claudio Thieme

a

, Diego Prior

b , ∗, Emili Tortosa-Ausina

c

, René Gempp

a

a

Universidad Diego Portales, Chile

b

Departament of Business, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain

c

Universitat Jaume I and Ivie, Spain

a r t i c l e i n f o

Article history:

Received 27 May 2013

Accepted 15 January 2016

Available online xxx

Keywo rds:

Eﬃciency

Order- m

School effectiveness

Value added

a b s t r a c t

Value added models have been proposed to analyze different aspects related to school effectiveness on

the basis of student growth. There is consensus in the literature about the need to control for socioeco-

nomic status and other contextual variables at student and school level in the estimation of value added,

for which the methodologies employed have largely relied on hierarchical linear models. However, this

approach is problematic because results are based on comparisons to the school’s average—implying no

real incentive for performance excellence. Meanwhile, activity analysis models to estimate school value

added have been unable to control for contextual variables at both the student and school levels. In this

study we propose a robust frontier model to estimate contextual value added which merges relevant

branches of the activity analysis literature, namely, metafrontiers and partial frontier methods. We pro-

vide an application to a large sample of Chilean schools, a relevant country to study due to the reforms

made to its educational system that point out to the need of accountability measures. Results indicate not

only the general relevance of including contextual variables but also how they contribute to explaining

the performance differentials found for the three types of schools—public, privately-owned subsidized,

and privately-owned fee-paying. Also, the results indicate that contextual value added models generate

school rankings more consistent with the evaluation models currently used in Chile than any other type

of evaluation models.

©2016 Elsevier B.V. All rights reserved.

1. Introduction

The development of indicators to evaluate the quality of educa-

tion is a core element of countries’ effort s to implement improve-

ments in their education systems ( Battauz, Bellio, & Gori, 2011 ).

In many countries, this concern has motivated the adoption of ac-

countability systems ( Kane & Staiger, 2002 ), whose main objective

is to evaluate school quality and report these results to parents,

principals, teachers, or policy makers, who will use them to make

choices about schools, to improve their professional practice or

✩ This study was primarily supported by Grant # 112116 4 and Grant # 11 513 13

awarded to Claudio Thieme, Re né Gempp and Emili Tortosa-Ausina by FONDECYT

(National Fund of Scientiﬁc and Technological Development). Diego Prior and Emili

Tortosa-Ausina acknowledge the ﬁnancial support of Ministerio de Ciencia e In-

novación ( ECO2013-44115-P and ECO2014-55221-P ).

Emili Tortosa-Ausina also ac-

knowledges the ﬁnancial support of Generalitat Valenciana ( PROMETEOII/2014/046

and ACOMP/2014/283 ) and Universitat Jaume I ( P1.1B2014-17 ). All four authors are

grateful to José Manuel Cordero and three refere es whose comments contributed to

an overall improvement of the paper. The usual disclaimer applies.

∗Corresponding author. Te l.: +34 935811539; fax: +34 935812555.

E-mail address: diego.prior@uab.cat (D. Prior).

to develop educational policies.

1 The available empirical evidence

in this regard has contributed to strengthen this tendency, show-

ing that well designed accountability systems (i.e. those which

ﬁnd the responsibility attributable to each of the participants in

the educational system) enable organizational improvement inside

each school ( Rouse, Hannaway, Goldhaber, & Figlio, 2007 ), as well

as optimizing the educational outcomes ( Carnoy & Loeb, 2002;

Hanushek & Raymond, 2005 ). An underlying requisite of any ac-

countability system is to use a robust methodology to disentangle

what share of the students’ achievement can be attributed to the

school, and what share is simply the result of other variables be-

yond the school’s control.

In terms of methodology, the general consensus is that stu-

dents’ educational achievement depends both on their personal

characteristics as well as those of their school and context. In

order to analyze these scenarios, the most common and ac-

1 By way of example, see, for instance, the detailed information regarding

school performance in the UK disclosed in http://www.education.gov.uk/schools/

performance/ .

http://dx.doi.org/10.1016/j.ejor.2016.01.023

0377-2217/© 2016 Elsevier B.V. All rights reserved.

Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:

Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023

2 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

cepted methodology is multilevel regression models ( McCaffrey,

Lockwood, Koretz, Louis, & Hamilton, 2004 ), also known as hier-

archical linear models, or regression models with random effects

( Goldstein, 2003; Raudenbush & Bryk, 2002 ). The key charac-

teristic of these methods is their capacity to disentangle what

proportion of variance in student achievement can be explained

by student variables (level 1), and what share can be explained by

aggregate, or school, contextual variables (level 2). When multiple

levels are considered, such as hierarchical systems of students

nested in schools, it is possible to obtain a better understanding

and measurement of the causes that explain students’ learning

processes ( Aitkin & Longford, 1986 ). The multilevel approach is

highly relevant when attempting to make decisions, speciﬁc to

each student, school, or context, that contribute useful information

to develop new improvement processes in schools, discourage

managers’ opportunistic behavior, signal a correct resource endow-

ments policy (by establishing rewards and penalties), and make

decisions on public policies.

The initial stages of research on school accountability were

characterized by the use of cross sectional measures to estimate

school performance (e.g. the mean annual results of standardized

tests), but the current practice is to rely on panel data methods

to evaluate student performance in order to estimate the aca-

demic growth of students throughout their school life—ideally

also controlling for other relevant variables ( Goldstein et al., 1993;

Goldstein & Thomas, 1996; Gray, Jesson, Goldstein, Hedger, &

Rasbash, 1995; Mortimore, Sammons, & Thomas, 1994 ; Sammons,

1995 ). In this context of school accountability, the value added

(VA) of a given school can be broadly deﬁned as the contribution

that it makes to students’ net progress (i.e. to the learning objec-

tives) after the effects of other variables, external to the school,

have been removed ( Meyer, 1997 ). The basic value added model

compares schools’ performance controlling for students’ previous

achievement.

More complex value added models are also available, and over

the last few years there has been a growing tendency to use con-

textual VA models, which allow researchers to control for socioeco-

nomic status (SES), ethnic background, gender, and other variables

that are not under the school’s control or responsibility. Thus, con-

textual VA models provide an estimation of the net performance of

schools by removing the effect of previous achievement and other

preexisting differences among students ( Ballou, Sanders, & Wright,

2004 ). It is generally agreed that contextual variables should be

used to estimate VA models, especially when setting some form

of accountability, or when disseminating the results, since results

might be questionable if they do not take into account contex-

tual characteristics of both students and schools. Although there

is no consensus as to what speciﬁc contextual variables should be

included in the model ( Tek we et al., 2004 ), socioeconomic status

(SES) is usually one of them.

According to their characteristics, VA indicators emerge as an

attractive methodology for several actors interested in measuring

or improving school performance, including: (i) governments

(which need to rely on objective accountability measures); (ii)

politicians (who want to guarantee that the assessment of schools

considers their ethnic and socioeconomic diversity); (iii) re-

searchers (who need to study those factors contributing to school

effectiveness using net indicators, which are not spuriously con-

taminated by the characteristics of students); (iv) teachers and

school managers (who want objective measures of their perfor-

mance, tuned for their speciﬁc student populations); (v) parents

(who need to choose schools for their children according to their

real capacity to add value to their students); and (vi) society as

a whole, since this entails a more accurate and fair evaluation of

the schools in the country ( Drury & Doran, 2003; McCaffrey et al.,

2004 ).

It is also crucial to understand that school effectiveness

studies—including VA analysis—require using some kind of

methodology to compare the schools being evaluated with a

benchmark. In the case of VA research, the most popular method-

ology is multilevel regression models (see, for instance Goldstein

et al., 1993; Gray et al., 1995; Cervini, 2009; Goddard & Goddard,

2001 ). An implicit assumption of this approach is to use the av-

erage school as a benchmark. However, this approach is not free

from criticisms such as, for instance, that using the average as a

benchmark is not an incentive for excellence ( Bock, Wolfe, & Fisher,

1996; Kupermintz, 2003; McCaffrey, Lockwood, Koretz, & Hamilton,

2003 ), that according to traditional VA models test scores must be

vertically scaled, and that the appropriate functional form for the

model is not granted in advanced ( Murphy, 2012; Ray, Evans, &

McCormack, 2009 ).

An attractive approach to overcome this criticism is to consider

the models derived from the activity analysis literature, which

evaluate school performance by comparing any given school with

the best observed performance. Instead of using a regression line

as a benchmark, these methodologies consider a nonparametric

frontier built either using Data Envelopment Analysis (DEA), or

its nonconvex variant, namely, Free Disposal Hull (FDH).

2 In ad-

dition to explicitly deﬁning an optimal benchmark, frontier models

also allow several outputs to be used simultaneously (i.e. several

concurrent measures of student and school performance), offering

greater ﬂexibility to estimate VA.

In this line of research, there has been a growing interest in

developing approaches to estimate school effectiveness. For in-

stance, Silva Portela and Thanassoulis (2001) , De Witte, Thanas-

soulis, Simpson, Battisti, and Charlesworth-May (2010) and Portela,

Camanho, and Keshvari, (2013) have developed methodologies to

estimate basic VA models, whereas Thieme, Prior, and Tor to sa-

Ausina (2013) have proposed a model to analyse contextual ef-

fects in multilevel settings with cross-sectional data. However,

the existing methodologies have not been able to estimate con-

textual VA, namely, to develop a frontier model able to esti-

mate school VA effects controlling for students’ previous achieve-

ment, and also for contextual variables at student and school lev-

els. This development is crucial to further explore the use of

frontier models to estimate contextual VA models in real world

applications.

For this reason, the aim of this paper is both empirical and, to a

lesser degree, methodological. Regarding the latter (at the method-

ological level) we propose a robust frontier model to estimate

contextual value added (CVA) which combines both methodolog-

ical contributions from multilevel modeling to school VA, as well

as relevant proposals in the ﬁeld of activity analysis methods—

namely, the so-called metafrontiers ( Battese, Rao, & O’Donnell,

2004 ) as well as the partial frontier methods ( Cazals, Florens, &

Simar, 2002 ).

Regarding the former (at the empirical level), we use this novel

approach to analyze school effectiveness in Chile. This applica-

tion is especially relevant for this country which, since the 1980s,

has been implementing a series of reforms to its educational sys-

tem (see Mizala & Romaguera, 20 0 0 ), with strong emphasis on

accountability measures. Among other reforms, the government

transferred the management of public schools from the Ministry

of Education to city councils, and allowed for the participation of

private schools in the public system through a voucher system.

Simultaneously, an accountability system was created, consisting

2 We can also ﬁnd parametric varian ts to this literature, among which SFA

(Stochastic Frontier Analysis) is the most popular. Parametric and nonparametric

methods have both advantages and disadvantages, some of which have been re-

cently outlined by Badunenko, Henderson, and Kumbhakar (2012) .

Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:

Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023

C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 3

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

of national standardized tests of educational achievement applied

annually to all students in 4th, 8th or 10th grade. The average

school results of this assessment, called the SIMCE test ( Sistema

de Medición de la Calidad Educativa , or Measurement System of Ed-

ucational Quality), are reported to parents, and used by the Min-

istry of Education as a measure of school quality. More recently,

some new laws have been passed which emphsize further the

use of accountability measures. Speciﬁcally, the Law #20529 fea-

tures a new system of insurance of educational quality ( Sistema

Nacional de Aseguramiento de la Calidad de la Educación ) and a

new national agency of education quality ( Agencia de Calidad de la

Educación ). This public institution should classify the educational

institutions according to their school effectiveness and school per-

formance into four groups. Speciﬁcally, in its article 18 this law

indicates that the classiﬁcation should be based primarily on the

SIMCE average results achieved by the educational institutions. In

addition to this, the law considers that the procedure to make the

classiﬁcation could eventually include VA models, stating explicitly

that the agency should consider the results from the learning pro-

cess in all the evaluated areas as well as the characteristics of the

school’s students including, among others, their vulnerability and,

when applicable, indicators of progress or value added (article 17) .

Currently this agency has developed several alternative methodolo-

gies for performing the classiﬁcation of schools, but none of them

has included measures of value added—despite the fact that the

Law #20529 contemplates it, and it is a possible alternative con-

sidered in some research initiatives. Moreover, the methodologies

considered correspond to variations in what we will later refer to

as Model 1.

The resulting classiﬁcation will be obviously contingent to the

different models chosen as well as the estimation procedures, as

well as data availability. Therefore, these deﬁnitions might have

important consequences for Chilean schools, among which we

should consider that this Law also contemplates issues ranging

from involvement in organizational aspects or the closure of low-

performing schools. This article aims, among other issues, to con-

tribute to the existing debate in Chile on this issue, comparing the

effect on the classiﬁcation of schools using value-added vs. status

models (such as those that the agency is currently contemplating).

This is possible due to some particularities of the SIMCE test cal-

endar, since 2009 was the ﬁrst time in which the same student

that took the SIMCE test in 8th grade had also taken the test in

4th grade in year 2005.

This scenario allows us to apply our model to a large sample—

47,076 students from 948 primary schools. All students took Math-

ematics and Language SIMCE tests. The sample was made up of 4th

and 8th grade students (9 and 13 years old, respectively), for which

we have socioeconomic information on their families (at student

level), obtained via a questionnaire for parents. In an attempt to

achieve a reliable and homogeneous sample at school level, we

only included those students in schools who took both exams and

for whom we had socioeconomic information, whose schools met

the requirement of having more than 30 students meeting these

criteria, and that this value corresponded to 60 percent of the stu-

dents at the school who took the exam in 2009.

The rest of the article is organized as follows. In Section 2 we

describe the relevant theoretical framework. In Section 3 we de-

tail the methodology used. The background of the empirical appli-

cation and the description of the database used are presented in

Section 4 . The comparative results between models are discussed

in Section 5 , and the main conclusions of the study are outlined in

Section 6 .

2. Theoretical framework

2.1. The assessment model

All national or state accountability systems attempt to improve

learning and instruction processes, but they differ signiﬁcantly in

the way they control for both the quality and progress of schools.

This heterogeneity leads to different perceptions of which schools

should be rewarded, and which should be encouraged to improve,

among other recommendations.

There are several frameworks to classify these evaluation mod-

els ( Carlson, 2001 ), but most of them take into account two fun-

damental aspects. First, one may distinguish between two dif-

ferent approaches for monitoring school performance, namely,

status models and growth models. Status models use a single year

to evaluate students’ academic achievement (i.e. cross sectional

data), whereas growth models use two or more years (i.e. panel

data). Second, in both approaches one may distinguish between

models that use contextual variables (both at student and school

level) to evaluate school achievement, and those which do not. This

scenario, and the core questions that these models try to answer,

can be represented in a 2 ×2 matrix as in Table 1 .

As Table 1 shows, Model 0 (which is also referred to as type

0 model) only considers for the evaluation the outputs related to

students’ academic achievement in a given time period. In the lit-

erature on education these models are usually referred to as aca-

demic achievement status models without contextual variables. As

indicated by Tek we et al. (2004) , the distinguishing characteristic

of status-based models is the absence of adjustment for students’

incoming knowledge level. This would imply that the differences

among schools in terms of the average knowledge of their incom-

ing students is convoluted with the assessment of teaching qual-

ity. An implicit assumption is that all students and schools have

optimal and similar backgrounds. Therefore, accountability systems

based on this model consider that students’ academic achievement

is entirely attributable to schools, disregarding evidence in the lit-

erature that a large share of students’ academic achievement might

be attributable to contextual factors, which are non-controllable,

and not attributable to the school itself ( Teddlie & Reynolds,

20 0 0 ).

The strongest criticisms suggest that this model could generate

perverse incentives for the attainment of the objective being pur-

sued, endowing fewer resources to those students with relatively

Tabl e 1

Types of evaluation models.

Without contextual variables With contextual variables

Status (one student assessment) Model 0 (status) : What is the level of academic

achievement of the students in this school?

Model 1 (contextual status, CS) : Which is the level of

academic achievement of the students in this school,

according to the students and/or school contextual

factors?

Value added (two or more

students’ assessments)

Model 2 (VA, value added) : Is this an effective school?

According to the achievement of students upon

enrollment, how much do they learn or develop while

they are at school?

Model 3 (CVA, contextual value added) : Is the school

more effective? Given students’ achievement

level upon

entrance, how much do they learn, or develop, while

they are at school, according to either the students’ or

school’s contextual factors?

Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:

Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023

4 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

worse results who do not help their schools to achieve their ob-

jectives. Simultaneously, this could generate selection of students

within schools, or lead to self-selection ( Wilson, 2004 ). In spite of

these disadvantages, we analyze Model 0 as a ﬁrst step, because it

is the approach currently used in Chile to evaluate its schools since

1999. Therefore, it is of interest to compare its results with those

that could be yielded by other models proposed in this study.

Model 1 extends the variables considered. While, analogously

to model 0, it includes the outputs related to students’ academic

achievement at a given moment of time, it also considers input

variables not attributable to the school, either at student or at

school level. This model corresponds to an academic achievement

status model with contextual variables, according to the literature

on education. A recent example of this type of approach is the

study by Thieme et al. (2013) , which proposes a multilevel model

incorporating contextual variables both at student and school lev-

els. Despite the remarkable progress it represents, by not consid-

ering as input the initial academic achievement of students, it as-

sumes that it is the same, and optimal, for all students, a situation

which is obviously far from reality, and could lead to misinterpre-

tation of results. It should be noted that the methodologies recom-

mended by the Chilean agency for the quality of education (“Agen-

cia de Calidad de la Educación”) for the classiﬁcation of schools

correspond to this family of models.

Model 2 corresponds to a “pure” value-added model; the only

inputs and outputs it considers are the results of students’ aca-

demic achievement, both at the beginning and at the end of the

educational process under evaluation. The educational research lit-

erature considers that value added measures (gain) are more in-

formative measures of the effectiveness of institutions, since they

allow the effect of the school’s student progress to be isolated

( Wilson & Piebalga, 2008 ), and contribute to reducing incentives

for dishonest behavior. Two relatively recent studies are consistent

with this model, namely, Silva Portela and Thanassoulis (2001) and

De Witte et al. (2010) . However, as in the previous model, they

have the disadvantage of not controlling for non-school elements

which inﬂuence this particular process.

Model 3 overcomes the disadvantages of models 1 and 2. In the

education research literature this type of model emerged strongly

as a reﬁnement of measures of growth, and has been called CVA

(contextual value added). The CVA was ﬁrst used in 2006 in British

schools, and it is a measure intended to isolate the real impact of

the school on students’ progress. This type of modeling involves

obtaining results that consider a number of factors such as gender,

ethnicity, and language of origin, among others. The difference be-

tween the model estimate and the result that the student actually

achieves is what is referred to as CVA ( Wilson & Piebalga, 2008 ).

There is an open discussion about this point, with arguments

for and against using several contextual variables in VA models

( Timmermans, Doolaard, & de Wolf, 2011; Willms & Raudenbush,

198 9 ). In our case, we have decided to include only the SES, ba-

sically for three reasons. First, it is important to understand that,

in the Chilean context, the most important contextual variable ex-

plaining school performance is SES, as repeatedly demonstrated

in many studies ( Auguste & Valenzuela, 2003; Bellei, 2009; Con-

treras, Sepúlveda, & Bustos, 2010; Gauri, 1999; Hsieh & Urquiola,

2006 ). For instance, Manzi, Strasser, San Martín, and Contreras

(2008) show that a very large and stable percentage of between-

school variance is accounted for by socioeconomic factors; in fact,

over 60 percent of this variance is explained by the combination

of the individual SES and school average SES. This result is con-

sistent with PISA decomposition of variance ( OECD, 2010 ), which

shows that Chile is one of the countries with the largest percent-

age of between-school variance explained by socioeconomic fac-

tors. Manzi et al. (2008) also show that the effect of other variables

is negligible, and that the type of school does not explain a rele-

vant share of between school variance once socioeconomic factors

are controlled for. These results corroborate the extent of socioe-

conomic segregation in the Chilean educational system (see also

Carrasco & San Martín, 2012; Valenzuela, Bellei, & Ríos, 2014 ).

Considering the previous arguments, it is important to under-

stand that, in the case of Chile, the most important contextual vari-

able to analyze school performance is SES both at individual and

school (average) levels, and due to this reason we include these

variables and no other. Second, and from a broader perspective,

models with different types of contextual variables usually yield

similar results ( Timmermans et al., 2011 ), which would include

both basic and advanced value added models ( Harris, 2011 ). How-

ever, interestingly, although different VA models might use differ-

ent contextual variables, there is a wide consensus on the inclu-

sion among them of SES (e.g. see the OECD report on VA models

in different countries). Third, using SES as the only control variable

enables achieving a balance between the amount of information

used and the sample size. It is debatable if the increased amount of

information considered when including more contextual variables

offsets the problems derived from the curse of dimensionality, a

relevant issue to control for in nonparametric models for eﬃciency

measurement ( Simar and Wilson, 2008 , p. 441, chap. 4). Actually,

in our particular case the number of contextual variables which we

summarize in our composite index is very high (4) and, therefore,

we consider this to be a case in which a composite index may be

particularly convenient.

Despite the great advances that models 1 and 2 represent,

when activity analysis methods are considered to evaluate them,

model 3 in Table 1 best isolates the real impact of the school

on students’ progress, as indicated by many contributions from

the traditional literature on school evaluation—which generally use

parametric multilevel analysis. Therefore, evaluating model 3 con-

sidering activity analysis methods has some unexplored advantages

that will be part of our aims. It should be considered that, up to

now, the Chilean agency for the quality of education has not con-

sidered type 2 or type 3 models for the classiﬁcation of schools.

2.2. The evaluation methodology

As indicated in the introduction, in recent years there has been

considerable progress in the evaluation methodology of school

performance, especially regarding the development of multilevel

models ( Bryk & Raudenbush, 1992; Goldstein, 1995 ). The general

concept is that students’ academic achievement depends on their

personal characteristics, and the characteristics of the school,

and its context. To analyze these situations, the different levels

are considered as hierarchical systems of students and schools,

with individuals and groups deﬁned in separate hierarchies, using

variables that are deﬁned at each level ( Hox, 2002 ).

This signiﬁcant progress can solve the main methodological

problem of the pioneering studies in this ﬁeld, by breaking down

the various nested effects that explain students’ educational out-

comes. The percentage of student achievement due to the different

variables at different organizational levels—district, school, class,

and student—can also be determined.

In this particular area, there are many statistical models for es-

timation which differ in several regards such as the deﬁnition and

inclusion of adjustment variables ( Tekw e et al., 2004 ). However,

the most prominent position is to include adjustment variables,

especially when establishing some form of accountability or dis-

semination of the results, since the equity is questionable if the

background characteristics of students and schools are not taken

into account ( McCaffrey et al., 2004; McCaffrey et al., 2003 ).

Despite the many methodological and empirical contributions,

this research is not without its criticisms ( Kupermintz, 2003; McE-

wan, 2003 ). One of them is related to the nature of their estimate

C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 5

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

as a comparison with the average, assuming no real incentive for

performance excellence. Indeed, the vast majority of value-added

studies used multilevel regression or analysis.

An alternative is found in the models that consider activity

analysis techniques, mainly using nonparametric frontier methods

(mostly Data Envelopment Analysis, DEA, and its nonconvex coun-

terpart, Free Disposal Hull, FDH). They provide relevant advantages

such as the ability to compare with the optimal or, more impor-

tantly, the possibility to specify several inputs and outputs simulta-

neously. In the ﬁeld of education many studies have adopted these

techniques (see, for instance Mizala, Romaguera, and Farren, 2002;

Thanassoulis, Kortelainen, Johnes, and Johnes, 2011; Portela, Ca-

manho, and Keshvari, 2013; Johnes and Johnes, 2009 , among oth-

ers). However, these methods are not exempt from general criti-

cisms. On the one hand, regarding the nature of these methods,

their deterministic and probabilistic features, the curse of dimen-

sionality, or their heavy reliance on the absence of outliers have

been a source of continuous concern. On the other, in the speciﬁc

ﬁeld of education, their main disadvantage has been to consider

only student-level data, which would yield estimations that incor-

rectly assume that schools are operating with the optimal endow-

ment of inputs (both, controllable or uncontrollable), without es-

tablishing thus a multilevel analysis.

From the perspective of the non-parametric deterministic tech-

nology, Cazals et al. (2002) pointed out the problem of lack of sta-

tistical properties and the impact that the presence of outliers can

cause. From the perspective of student data, another criticism is

that multilevel analysis avoids the resources schools allocate in or-

der to sustain the education process ( McCaffrey et al., 2004; Mc-

Caffrey et al., 2003 ). Both problems are addressed in this paper, by

providing an integrated approach with the use of metafrontier ap-

proaches ( Battese et al., 2004 ) and the use of robust partial order-

m frontiers ( Cazals et al., 2002 ).

Some previous research initiatives have taken these consid-

erations into account. Speciﬁcally, our aims and methods are

consistent with previous literature such as Silva Portela and

Thanassoulis (2001) who, following model 2, decompose the

overall eﬃciency into two different effects, namely, school effect

and student-within-school effect. Later on, De Witte et al. (2010)

reﬁned this methodology, proposing a robust approach based on

Cazals et al. ’s (2002) ideas for the estimation. A more recent

contribution by Thieme et al. (2013) , based on model 1, consid-

ered both ideas, performing a multilevel decomposition in which

additional variables are factored in so as to provide a more com-

prehensive analysis. However, despite their interest, these previous

studies disregard the existence of contextual factors—at both the

student and school level—in the assessment of school performance.

In contrast, our proposal here is based on the deﬁnition of a

contextualized value-added robust multilevel nonparametric fron-

tier assessment that separates the net effects of student and

school, controlling for socioeconomic status, both at the student

and school level, eliminating (or at least drastically reducing) the

potential problems caused by the existence of outliers and dimen-

sionality problems, as will be explained in the following section.

3. Methodology

3.1. The decomposition of overall eﬃciency

Following the rationale described in the above paragraphs, our

model is inspired by Silva Portela and Thanassoulis ’s (2001) initial

contribution, in which two frontiers are considered, namely, the lo-

cal and the global frontiers. Whereas the former is speciﬁc to each

school, and oriented to an estimation of student-within-school ef-

ﬁciency, the latter is used to estimate student within-all-schools

eﬃciency. The so-called student’s effect (henceforth STE ), or stu-

dent’s eﬃciency, will determine the distance to the local frontier.

In contrast, the school’s effect (henceforth SCE ), or school’s eﬃ-

ciency, refers to the distance separating the local and the global

frontiers. Model 2 in Fig. 1 documents the ideas underlying both

effects.

In Fig. 1 , the student (c) achieves an output level represented

by y

c

, corresponding to an input level x

c

—the score achieved by

the student in a previous academic year. When the student’s (c)

academic performance is compared with the local frontier (which

corresponds to the school in which student c is enrolled, i.e. school

d ), one may notice that student c is ineﬃcient. This occurs because

there are more eﬃcient students enrolled in the same school d

who achieve better results ( y

2

) using the same inputs—or previ-

ous knowledge ( x

c

). Therefore, the student’s effect, or what Silva

Portela and Thanassoulis (2001) refer to as the student-within-

school’s eﬃciency , is determined as the ratio of the potential to the

actual output, i.e. ST E

2

= y

2

/y

c

. This student’s effect is higher than

unity when the student is ineﬃcient (as in the case presented in

Fig. 1 ), and equal to unity otherwise. The eﬃciency coeﬃcient for

the student under analysis will be OE

2

= y

2

/y

c when compared to

the overall frontier, or the student-within-all-schools’ eﬃciency in

the terms used by Silva Portela and Thanassoulis (2001) . Having

these two reference frontiers, the school’s effect ( SCE

2

, a sort of

technology-gap ratio separating the school-speciﬁc frontier from

the overall frontier) is determined by comparing the overall and

local frontiers ( SCE

2

= y

2

/y

2

= OE

2

/ST E

2

).

In summary, the proposal of Silva Portela and Thanas-

soulis (2001) decomposes the global eﬃciency into two effects,

namely:

Overall eﬃciency (OE

2

) = Student’s effect (ST E

2

)

×School’s effect (SCE

2

) (1)

According to the taxonomy proposed in Table 1 , this decompo-

sition corresponds to model 2, or pure value added . As mentioned

above, we partially follow this proposal as we are interested in

the student as the unit of analysis. However, in contrast to Silva

Portela and Thanassoulis (2001) , our aim is to develop a multilevel

decomposition of contextual value added ( CVA ). This implies consid-

ering not only students’ academic results, but also contextual fac-

tors regarding student and schools. To that end, we follow a pre-

vious proposal ( Thieme et al., 2013 ), classiﬁable as a type 1 model

(considering only the contextual status but not the value added;

see Table 1 ), which introduces successive decompositions after the

consideration of speciﬁc variables. The ﬁnal effect is the modiﬁca-

tion of the school effect, after introducing contextual variables on

the average socio-economic level of the parents of students attend-

ing the same school.

In Fig. 1 we illustrate the differences between model 2 (con-

sisting of the assessment of the pure value added ), and our pro-

posal, which gives rise to model 3 (a contextual value added assess-

ment). Regarding the student’s effect, we see that y

2

> y

3

because

in model 3 we consider as inputs not only the previous scores but

also the socio-economic and cultural level of the student’s family.

This would imply that, in order to estimate y

3

, we consider the

student’s family context, whereas y

2

implicitly assumes that this

context does not interfere with students’ scores. In other words,

benchmark y

3

comes from a student that, having the same pre-

vious scores and comparable socio-economic situation, achieves a

better academic outcome. In contrast, benchmark y

2

is that cor-

responding to another student with a better socio-economic situa-

tion. The contextual variables also have an impact on the school ef-

fect. Indeed, it is also clear that y

2

> y

3

, since the contextual socio-

economic environment could also affect the student’s achievement

in y

3

.

Summing up, when comparing models 2 and 3 we see that part

of what is considered as student’s ineﬃciency ( STE

2

), according to

6 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

Fig. 1. Decomposition of the pure Value Added (model 2) and Contextual Value Added (model 3).

model 3 is attributable to both the effect of the contextual vari-

ables ( STCE

3

) and the net student’s effect ( STE

3

). Analogously, the

school’s effect from model 2 ( SCE

2

) can be decomposed in order

to account for the impact of the context due to socio-economic

factors ( SCCE

3

) and the net effect of the school ( SCE

3

). This im-

plies that a potential technology gap (represented by both STCE

3

and SCCE

3

) appears when the context has a signiﬁcant impact on

the scores that students can achieve. In model 3 this gap may, or

may not, be signiﬁcant whereas in model 2, by deﬁnition, the im-

pact is nonexistent. To conclude, the decomposition corresponding

to model 3 can be expressed as:

Overall eﬃciency = Contextual effect on eﬃciency

×Net Overall eﬃciency

= Contextual effect on eﬃciency

×Student’s effect ×School’s effect (2)

or:

OE = (y

2

/y

c

) = (y

2

/y

3

) ×(y

2

/y

3

) ×(y

3

/y

c

) ×(y

3

/y

2

) (3)

3.2. Using partial frontiers

When estimating ineﬃciency levels, the ﬁrst decision to be

made is the speciﬁcation of the technology, which has relevant im-

plications as different technologies could lead to different results.

Many previous applications have considered DEA models, implying

that a convex technology is assumed—i.e. each ineﬃcient student

will be compared to her more eﬃcient peers, or combinations of

them. In contrast, FDH models require comparison with an existing

student, and linear combinations are not allowed as a benchmark—

i.e. the convexity assumption is dropped.

Both DEA and FDH have some shortcomings, among which we

may highlight the so-called curse of dimensionality and the poten-

tial impact of outliers. Some studies have established the statisti-

cal properties of the FDH estimator ( Kneip, Park, & Simar, 1998;

Simar & Wilson, 20 0 0 ), indicating that the dimensionality prob-

lems of the FDH models originate from their slow convergence

rates. However, their statistical properties are very appealing, since

they are consistent estimators for any monotone boundary—i.e.

by imposing only strong disposability. In addition, as shown by

C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 7

ARTICLE IN PRESS

JID: EOR [m5G; March 27, 2016;14:8 ]

Park, Simar, and Weiner (20 0 0) , FDH has additional advantages

over convex models, since the latter causes a speciﬁcation error

when the true technology is nonconvex.

3

In our study we will assume nonconvex technologies which, in

our particular setting will imply that an existing student will be

compared with another existing albeit more eﬃcient student. How-

ever, FDH approaches have also some limitations and, therefore, we

will consider a partial frontier approach such as order- m ( Cazals

et al., 2002 ), which is much more robust to both outliers and the

curse of dimensionality.

4

Accordingly, we assume there is information available on the

input and output vectors ( x

c

= (x

c, 1

, x

c, 2

, . . . , x

c,i

, . . . , x

c,I

) and y

c

=

(y

c, 1

, y

c, 2

, . . . , y

c,j

, . . . , y

c,J

) , respectively) for each student in the

sample ( 1 , 2 , . . . , C). We will then characterize the elements of the

integer activity vector as λ=(λ1

, λ2

, . . . , λC

) and the eﬃciency co-

eﬃcient as αF DH

c

.

Then, the output-oriented FDH eﬃciency scores will be yielded

by solving the following linear programming problem:

max

{ αFDH

c

,λ1

,λ2

, ... ,λC

}

αF DH

c

,

s.t.

C

s =1

λs

x

s,i

−x

c,i

≥0 , i = 1 , . . . , I,

−

C

s =1

λs

y

s,j

+ αF DH

c

y

c,j

≥0 , j = 1 , . . . , J,

C

s =1

λs

= 1 ,

λs

∈ { 0 , 1 } , s = 1 , . . . , S.

(4)

Integer problem (4) identiﬁes, for each student c to be FDH-

eﬃcient, another student in the sample with better performance—

i.e. the student with coeﬃcient λs

∗= 1 . Then it estimates the out-

put increase ( 1 −αF DH

c

) which is needed to reach the nonconvex

frontier, αF DH

c

> 1 . Therefore, by solving integer problem (4) , we

will have an activity vector λc

= 1 as well as an eﬃciency coeﬃ-

cient αF DH

c

= 1 for FDH-eﬃcient students.

As indicated earlier, there are contributions in the literature

that provide methods to overcome the curse of dimensionality and

the effect of outliers inherent to FDH. Among them, the order- m

estimator ( Cazals et al., 2002; Simar, 2003 ) has become one of the

most popular methods to get round these issues while at the same

time maintaining the advantages of a nonconvex and nonparamet-

ric methodology.

According to this method, we will ﬁrst consider a positive ﬁxed

integer, m . For a given level of input ( x

c , i

) and output ( y

c , j

), the

order- m estimation deﬁnes the expected value of maximum of m

random variables ( y

1 ,j

, ... , y

m,j

), drawn from the conditional dis-

tribution of the output matrix Y for which y

m , j

> y

c , j

. Formally,

the proposed algorithm to compute the order- m estimator has four

steps:

1. For a given level of y

c , j

, draw a random sample of size m with

replacement among those y

m , j

, such that y

m , j

≥y

c , j

.

2. Compute program (4) and estimate

αc

.

3. Repeat steps 1 and 2 B times and obtain B eﬃciency coeﬃ-

cients

αb

c

(b = 1 , 2 , . . . , B ) . The quality of the approximation can

3 Some authors such as Thanassoulis and Silva Portela (2002) have proposed

other methods to overcom e the issue of outliers, by identifying and eliminating

the extreme (super-eﬃcient) cases; however, this is a controversial approach, since

these units can convey relevant information. In the particular context of educa-

tion, eliminating super-eﬃcient observations

could lead to an increase of overall

eﬃciency—magnifying mediocrity, and reducing potential eﬃciency gains that could

be achieved.

4 In a recent paper, Krüger (2012) ranked the order- m estimation method as dom-

inated, in general conditions, by the stochastic frontier, DEA and FDH methods. So,

it appears that its use should be restricted to those cases characterized by the sig-

niﬁcant presence of outliers. This is precisely the

case of education, where some

students with very limited resources may achieve a brilliant academic curriculum.

be tuned by increasing B (in most applications B = 200 seems

to be a reasonable choice, but we decided to set B = 20 0 0 ).

4. Compute the empirical mean of B samples as:

αm

c

=

B

b=1

αb

c

B

(5)

The number of observations considered in the estimation ap-

proaches the observed units that meet the condition y

m , j

> y

c , j

as m increases, whereas the expected order- m estimator in each

of the b iterations

αb

c

tends to the FDH eﬃciency coeﬃcient

αF DH

c

.

Therefore, m is an arbitrary positive integer value, but it is always

convenient to observe the ﬂuctuations of the

αb

c

coeﬃcients that

will ultimately depend on the level of m . αm

c

will normally take

values higher than the unity for acceptable values of m (this indi-

cates that these units are ineﬃcient, as outputs can be increased

without modifying the inputs allocated). When αm

c

< 1 , the unit

c may be labeled as super-eﬃcient, provided the order- m frontier

shows lower output levels than the unit under analysis.

In order to carry out the multilevel estimation, we adapt the

metafrontier approaches proposed by Battese and Rao (2002) ,

Battese et al. (2004) and O’Donnell, Prasada Rao, and Battese

(2008) . In the case of Model 3, this process has the following steps:

(a) Classify students (1 , 2 , . . . , C) depending on the school they are

enrolled in (1 , 2 , . . . , D ) .

(b) Complete steps 1 to 4 of the order-m algorithm to estimate the

eﬃciency coeﬃcients corresponding to each student in the spe-

ciﬁc school she/he is enrolled in ( αm

c

), and with regards to the

overall frontier (meaning, considering the school frontier point

represented by y

2

in in Fig. 1 and the eﬃciency corresponding

to the overall frontier y

2

).

(c) After completing process b, add new input variables (the socio-

economic and cultural level corresponding to the student’s fam-

ily and the average of the same variable for the school) and

apply again steps 1 to 4 of the order- m estimation to the com-

plete sample to estimate the eﬃciency coeﬃcients with respect

to the metafrontier ( y

3

) and with the local frontier y

3

. These

new coeﬃcients provide an assessment of the student’s eﬃ-

ciency with respect to the overall metafrontier, taking into ac-

count only schools operating with no better environmental fac-

tors than the school where the student is enrolled (precisely

what is represented by point y

3