Value added, educational accountability approaches and their effects on schools’ rankings: Evidence from Chile

Article (PDF Available)inEuropean Journal of Operational Research 253(2) · February 2016with 162 Reads
DOI: 10.1016/j.ejor.2016.01.023
Cite this publication
Abstract
Value added models have been proposed to analyze different aspects related to school effectiveness on the basis of student growth. There is consensus in the literature about the need to control for socioeconomic status and other contextual variables at student and school level in the estimation of value added, for which the methodologies employed have largely relied on hierarchical linear models. However, this approach is problematic because results are based on comparisons to the school’s average—implying no real incentive for performance excellence. Meanwhile, activity analysis models to estimate school value added have been unable to control for contextual variables at both the student and school levels. In this study we propose a robust frontier model to estimate contextual value added which merges relevant branches of the activity analysis literature, namely, metafrontiers and partial frontier methods. We provide an application to a large sample of Chilean schools, a relevant country to study due to the reforms made to its educational system that point out to the need of accountability measures. Results indicate not only the general relevance of including contextual variables but also how they contribute to explaining the performance differentials found for the three types of schools—public, privately-owned subsidized, and privately-owned fee-paying. Also, the results indicate that contextual value added models generate school rankings more consistent with the evaluation models currently used in Chile than any other type of evaluation models.
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
European Journal of Operational Research 0 0 0 (2016) 1–16
Contents lists available at ScienceDirect
European Journal of Operational Research
journal homepage: www.elsevier.com/locate/ejor
Innovative Applications of O.R.
Value added, educational accountability approaches and their effects
on schools’ rankings: Evidence from Chile
Claudio Thieme
a
, Diego Prior
b , , Emili Tortosa-Ausina
c
, René Gempp
a
a
Universidad Diego Portales, Chile
b
Departament of Business, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
c
Universitat Jaume I and Ivie, Spain
a r t i c l e i n f o
Article history:
Received 27 May 2013
Accepted 15 January 2016
Available online xxx
Keywo rds:
Efficiency
Order- m
School effectiveness
Value added
a b s t r a c t
Value added models have been proposed to analyze different aspects related to school effectiveness on
the basis of student growth. There is consensus in the literature about the need to control for socioeco-
nomic status and other contextual variables at student and school level in the estimation of value added,
for which the methodologies employed have largely relied on hierarchical linear models. However, this
approach is problematic because results are based on comparisons to the school’s average—implying no
real incentive for performance excellence. Meanwhile, activity analysis models to estimate school value
added have been unable to control for contextual variables at both the student and school levels. In this
study we propose a robust frontier model to estimate contextual value added which merges relevant
branches of the activity analysis literature, namely, metafrontiers and partial frontier methods. We pro-
vide an application to a large sample of Chilean schools, a relevant country to study due to the reforms
made to its educational system that point out to the need of accountability measures. Results indicate not
only the general relevance of including contextual variables but also how they contribute to explaining
the performance differentials found for the three types of schools—public, privately-owned subsidized,
and privately-owned fee-paying. Also, the results indicate that contextual value added models generate
school rankings more consistent with the evaluation models currently used in Chile than any other type
of evaluation models.
©2016 Elsevier B.V. All rights reserved.
1. Introduction
The development of indicators to evaluate the quality of educa-
tion is a core element of countries’ effort s to implement improve-
ments in their education systems ( Battauz, Bellio, & Gori, 2011 ).
In many countries, this concern has motivated the adoption of ac-
countability systems ( Kane & Staiger, 2002 ), whose main objective
is to evaluate school quality and report these results to parents,
principals, teachers, or policy makers, who will use them to make
choices about schools, to improve their professional practice or
This study was primarily supported by Grant # 112116 4 and Grant # 11 513 13
awarded to Claudio Thieme, Re né Gempp and Emili Tortosa-Ausina by FONDECYT
(National Fund of Scientific and Technological Development). Diego Prior and Emili
Tortosa-Ausina acknowledge the financial support of Ministerio de Ciencia e In-
novación ( ECO2013-44115-P and ECO2014-55221-P ).
Emili Tortosa-Ausina also ac-
knowledges the financial support of Generalitat Valenciana ( PROMETEOII/2014/046
and ACOMP/2014/283 ) and Universitat Jaume I ( P1.1B2014-17 ). All four authors are
grateful to José Manuel Cordero and three refere es whose comments contributed to
an overall improvement of the paper. The usual disclaimer applies.
Corresponding author. Te l.: +34 935811539; fax: +34 935812555.
E-mail address: diego.prior@uab.cat (D. Prior).
to develop educational policies.
1 The available empirical evidence
in this regard has contributed to strengthen this tendency, show-
ing that well designed accountability systems (i.e. those which
find the responsibility attributable to each of the participants in
the educational system) enable organizational improvement inside
each school ( Rouse, Hannaway, Goldhaber, & Figlio, 2007 ), as well
as optimizing the educational outcomes ( Carnoy & Loeb, 2002;
Hanushek & Raymond, 2005 ). An underlying requisite of any ac-
countability system is to use a robust methodology to disentangle
what share of the students’ achievement can be attributed to the
school, and what share is simply the result of other variables be-
yond the school’s control.
In terms of methodology, the general consensus is that stu-
dents’ educational achievement depends both on their personal
characteristics as well as those of their school and context. In
order to analyze these scenarios, the most common and ac-
1 By way of example, see, for instance, the detailed information regarding
school performance in the UK disclosed in http://www.education.gov.uk/schools/
performance/ .
http://dx.doi.org/10.1016/j.ejor.2016.01.023
0377-2217/© 2016 Elsevier B.V. All rights reserved.
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
2 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
cepted methodology is multilevel regression models ( McCaffrey,
Lockwood, Koretz, Louis, & Hamilton, 2004 ), also known as hier-
archical linear models, or regression models with random effects
( Goldstein, 2003; Raudenbush & Bryk, 2002 ). The key charac-
teristic of these methods is their capacity to disentangle what
proportion of variance in student achievement can be explained
by student variables (level 1), and what share can be explained by
aggregate, or school, contextual variables (level 2). When multiple
levels are considered, such as hierarchical systems of students
nested in schools, it is possible to obtain a better understanding
and measurement of the causes that explain students’ learning
processes ( Aitkin & Longford, 1986 ). The multilevel approach is
highly relevant when attempting to make decisions, specific to
each student, school, or context, that contribute useful information
to develop new improvement processes in schools, discourage
managers’ opportunistic behavior, signal a correct resource endow-
ments policy (by establishing rewards and penalties), and make
decisions on public policies.
The initial stages of research on school accountability were
characterized by the use of cross sectional measures to estimate
school performance (e.g. the mean annual results of standardized
tests), but the current practice is to rely on panel data methods
to evaluate student performance in order to estimate the aca-
demic growth of students throughout their school life—ideally
also controlling for other relevant variables ( Goldstein et al., 1993;
Goldstein & Thomas, 1996; Gray, Jesson, Goldstein, Hedger, &
Rasbash, 1995; Mortimore, Sammons, & Thomas, 1994 ; Sammons,
1995 ). In this context of school accountability, the value added
(VA) of a given school can be broadly defined as the contribution
that it makes to students’ net progress (i.e. to the learning objec-
tives) after the effects of other variables, external to the school,
have been removed ( Meyer, 1997 ). The basic value added model
compares schools’ performance controlling for students’ previous
achievement.
More complex value added models are also available, and over
the last few years there has been a growing tendency to use con-
textual VA models, which allow researchers to control for socioeco-
nomic status (SES), ethnic background, gender, and other variables
that are not under the school’s control or responsibility. Thus, con-
textual VA models provide an estimation of the net performance of
schools by removing the effect of previous achievement and other
preexisting differences among students ( Ballou, Sanders, & Wright,
2004 ). It is generally agreed that contextual variables should be
used to estimate VA models, especially when setting some form
of accountability, or when disseminating the results, since results
might be questionable if they do not take into account contex-
tual characteristics of both students and schools. Although there
is no consensus as to what specific contextual variables should be
included in the model ( Tek we et al., 2004 ), socioeconomic status
(SES) is usually one of them.
According to their characteristics, VA indicators emerge as an
attractive methodology for several actors interested in measuring
or improving school performance, including: (i) governments
(which need to rely on objective accountability measures); (ii)
politicians (who want to guarantee that the assessment of schools
considers their ethnic and socioeconomic diversity); (iii) re-
searchers (who need to study those factors contributing to school
effectiveness using net indicators, which are not spuriously con-
taminated by the characteristics of students); (iv) teachers and
school managers (who want objective measures of their perfor-
mance, tuned for their specific student populations); (v) parents
(who need to choose schools for their children according to their
real capacity to add value to their students); and (vi) society as
a whole, since this entails a more accurate and fair evaluation of
the schools in the country ( Drury & Doran, 2003; McCaffrey et al.,
2004 ).
It is also crucial to understand that school effectiveness
studies—including VA analysis—require using some kind of
methodology to compare the schools being evaluated with a
benchmark. In the case of VA research, the most popular method-
ology is multilevel regression models (see, for instance Goldstein
et al., 1993; Gray et al., 1995; Cervini, 2009; Goddard & Goddard,
2001 ). An implicit assumption of this approach is to use the av-
erage school as a benchmark. However, this approach is not free
from criticisms such as, for instance, that using the average as a
benchmark is not an incentive for excellence ( Bock, Wolfe, & Fisher,
1996; Kupermintz, 2003; McCaffrey, Lockwood, Koretz, & Hamilton,
2003 ), that according to traditional VA models test scores must be
vertically scaled, and that the appropriate functional form for the
model is not granted in advanced ( Murphy, 2012; Ray, Evans, &
McCormack, 2009 ).
An attractive approach to overcome this criticism is to consider
the models derived from the activity analysis literature, which
evaluate school performance by comparing any given school with
the best observed performance. Instead of using a regression line
as a benchmark, these methodologies consider a nonparametric
frontier built either using Data Envelopment Analysis (DEA), or
its nonconvex variant, namely, Free Disposal Hull (FDH).
2 In ad-
dition to explicitly defining an optimal benchmark, frontier models
also allow several outputs to be used simultaneously (i.e. several
concurrent measures of student and school performance), offering
greater flexibility to estimate VA.
In this line of research, there has been a growing interest in
developing approaches to estimate school effectiveness. For in-
stance, Silva Portela and Thanassoulis (2001) , De Witte, Thanas-
soulis, Simpson, Battisti, and Charlesworth-May (2010) and Portela,
Camanho, and Keshvari, (2013) have developed methodologies to
estimate basic VA models, whereas Thieme, Prior, and Tor to sa-
Ausina (2013) have proposed a model to analyse contextual ef-
fects in multilevel settings with cross-sectional data. However,
the existing methodologies have not been able to estimate con-
textual VA, namely, to develop a frontier model able to esti-
mate school VA effects controlling for students’ previous achieve-
ment, and also for contextual variables at student and school lev-
els. This development is crucial to further explore the use of
frontier models to estimate contextual VA models in real world
applications.
For this reason, the aim of this paper is both empirical and, to a
lesser degree, methodological. Regarding the latter (at the method-
ological level) we propose a robust frontier model to estimate
contextual value added (CVA) which combines both methodolog-
ical contributions from multilevel modeling to school VA, as well
as relevant proposals in the field of activity analysis methods—
namely, the so-called metafrontiers ( Battese, Rao, & O’Donnell,
2004 ) as well as the partial frontier methods ( Cazals, Florens, &
Simar, 2002 ).
Regarding the former (at the empirical level), we use this novel
approach to analyze school effectiveness in Chile. This applica-
tion is especially relevant for this country which, since the 1980s,
has been implementing a series of reforms to its educational sys-
tem (see Mizala & Romaguera, 20 0 0 ), with strong emphasis on
accountability measures. Among other reforms, the government
transferred the management of public schools from the Ministry
of Education to city councils, and allowed for the participation of
private schools in the public system through a voucher system.
Simultaneously, an accountability system was created, consisting
2 We can also find parametric varian ts to this literature, among which SFA
(Stochastic Frontier Analysis) is the most popular. Parametric and nonparametric
methods have both advantages and disadvantages, some of which have been re-
cently outlined by Badunenko, Henderson, and Kumbhakar (2012) .
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 3
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
of national standardized tests of educational achievement applied
annually to all students in 4th, 8th or 10th grade. The average
school results of this assessment, called the SIMCE test ( Sistema
de Medición de la Calidad Educativa , or Measurement System of Ed-
ucational Quality), are reported to parents, and used by the Min-
istry of Education as a measure of school quality. More recently,
some new laws have been passed which emphsize further the
use of accountability measures. Specifically, the Law #20529 fea-
tures a new system of insurance of educational quality ( Sistema
Nacional de Aseguramiento de la Calidad de la Educación ) and a
new national agency of education quality ( Agencia de Calidad de la
Educación ). This public institution should classify the educational
institutions according to their school effectiveness and school per-
formance into four groups. Specifically, in its article 18 this law
indicates that the classification should be based primarily on the
SIMCE average results achieved by the educational institutions. In
addition to this, the law considers that the procedure to make the
classification could eventually include VA models, stating explicitly
that the agency should consider the results from the learning pro-
cess in all the evaluated areas as well as the characteristics of the
school’s students including, among others, their vulnerability and,
when applicable, indicators of progress or value added (article 17) .
Currently this agency has developed several alternative methodolo-
gies for performing the classification of schools, but none of them
has included measures of value added—despite the fact that the
Law #20529 contemplates it, and it is a possible alternative con-
sidered in some research initiatives. Moreover, the methodologies
considered correspond to variations in what we will later refer to
as Model 1.
The resulting classification will be obviously contingent to the
different models chosen as well as the estimation procedures, as
well as data availability. Therefore, these definitions might have
important consequences for Chilean schools, among which we
should consider that this Law also contemplates issues ranging
from involvement in organizational aspects or the closure of low-
performing schools. This article aims, among other issues, to con-
tribute to the existing debate in Chile on this issue, comparing the
effect on the classification of schools using value-added vs. status
models (such as those that the agency is currently contemplating).
This is possible due to some particularities of the SIMCE test cal-
endar, since 2009 was the first time in which the same student
that took the SIMCE test in 8th grade had also taken the test in
4th grade in year 2005.
This scenario allows us to apply our model to a large sample—
47,076 students from 948 primary schools. All students took Math-
ematics and Language SIMCE tests. The sample was made up of 4th
and 8th grade students (9 and 13 years old, respectively), for which
we have socioeconomic information on their families (at student
level), obtained via a questionnaire for parents. In an attempt to
achieve a reliable and homogeneous sample at school level, we
only included those students in schools who took both exams and
for whom we had socioeconomic information, whose schools met
the requirement of having more than 30 students meeting these
criteria, and that this value corresponded to 60 percent of the stu-
dents at the school who took the exam in 2009.
The rest of the article is organized as follows. In Section 2 we
describe the relevant theoretical framework. In Section 3 we de-
tail the methodology used. The background of the empirical appli-
cation and the description of the database used are presented in
Section 4 . The comparative results between models are discussed
in Section 5 , and the main conclusions of the study are outlined in
Section 6 .
2. Theoretical framework
2.1. The assessment model
All national or state accountability systems attempt to improve
learning and instruction processes, but they differ significantly in
the way they control for both the quality and progress of schools.
This heterogeneity leads to different perceptions of which schools
should be rewarded, and which should be encouraged to improve,
among other recommendations.
There are several frameworks to classify these evaluation mod-
els ( Carlson, 2001 ), but most of them take into account two fun-
damental aspects. First, one may distinguish between two dif-
ferent approaches for monitoring school performance, namely,
status models and growth models. Status models use a single year
to evaluate students’ academic achievement (i.e. cross sectional
data), whereas growth models use two or more years (i.e. panel
data). Second, in both approaches one may distinguish between
models that use contextual variables (both at student and school
level) to evaluate school achievement, and those which do not. This
scenario, and the core questions that these models try to answer,
can be represented in a 2 ×2 matrix as in Table 1 .
As Table 1 shows, Model 0 (which is also referred to as type
0 model) only considers for the evaluation the outputs related to
students’ academic achievement in a given time period. In the lit-
erature on education these models are usually referred to as aca-
demic achievement status models without contextual variables. As
indicated by Tek we et al. (2004) , the distinguishing characteristic
of status-based models is the absence of adjustment for students’
incoming knowledge level. This would imply that the differences
among schools in terms of the average knowledge of their incom-
ing students is convoluted with the assessment of teaching qual-
ity. An implicit assumption is that all students and schools have
optimal and similar backgrounds. Therefore, accountability systems
based on this model consider that students’ academic achievement
is entirely attributable to schools, disregarding evidence in the lit-
erature that a large share of students’ academic achievement might
be attributable to contextual factors, which are non-controllable,
and not attributable to the school itself ( Teddlie & Reynolds,
20 0 0 ).
The strongest criticisms suggest that this model could generate
perverse incentives for the attainment of the objective being pur-
sued, endowing fewer resources to those students with relatively
Tabl e 1
Types of evaluation models.
Without contextual variables With contextual variables
Status (one student assessment) Model 0 (status) : What is the level of academic
achievement of the students in this school?
Model 1 (contextual status, CS) : Which is the level of
academic achievement of the students in this school,
according to the students and/or school contextual
factors?
Value added (two or more
students’ assessments)
Model 2 (VA, value added) : Is this an effective school?
According to the achievement of students upon
enrollment, how much do they learn or develop while
they are at school?
Model 3 (CVA, contextual value added) : Is the school
more effective? Given students’ achievement
level upon
entrance, how much do they learn, or develop, while
they are at school, according to either the students’ or
school’s contextual factors?
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
4 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
worse results who do not help their schools to achieve their ob-
jectives. Simultaneously, this could generate selection of students
within schools, or lead to self-selection ( Wilson, 2004 ). In spite of
these disadvantages, we analyze Model 0 as a first step, because it
is the approach currently used in Chile to evaluate its schools since
1999. Therefore, it is of interest to compare its results with those
that could be yielded by other models proposed in this study.
Model 1 extends the variables considered. While, analogously
to model 0, it includes the outputs related to students’ academic
achievement at a given moment of time, it also considers input
variables not attributable to the school, either at student or at
school level. This model corresponds to an academic achievement
status model with contextual variables, according to the literature
on education. A recent example of this type of approach is the
study by Thieme et al. (2013) , which proposes a multilevel model
incorporating contextual variables both at student and school lev-
els. Despite the remarkable progress it represents, by not consid-
ering as input the initial academic achievement of students, it as-
sumes that it is the same, and optimal, for all students, a situation
which is obviously far from reality, and could lead to misinterpre-
tation of results. It should be noted that the methodologies recom-
mended by the Chilean agency for the quality of education (“Agen-
cia de Calidad de la Educación”) for the classification of schools
correspond to this family of models.
Model 2 corresponds to a “pure” value-added model; the only
inputs and outputs it considers are the results of students’ aca-
demic achievement, both at the beginning and at the end of the
educational process under evaluation. The educational research lit-
erature considers that value added measures (gain) are more in-
formative measures of the effectiveness of institutions, since they
allow the effect of the school’s student progress to be isolated
( Wilson & Piebalga, 2008 ), and contribute to reducing incentives
for dishonest behavior. Two relatively recent studies are consistent
with this model, namely, Silva Portela and Thanassoulis (2001) and
De Witte et al. (2010) . However, as in the previous model, they
have the disadvantage of not controlling for non-school elements
which influence this particular process.
Model 3 overcomes the disadvantages of models 1 and 2. In the
education research literature this type of model emerged strongly
as a refinement of measures of growth, and has been called CVA
(contextual value added). The CVA was first used in 2006 in British
schools, and it is a measure intended to isolate the real impact of
the school on students’ progress. This type of modeling involves
obtaining results that consider a number of factors such as gender,
ethnicity, and language of origin, among others. The difference be-
tween the model estimate and the result that the student actually
achieves is what is referred to as CVA ( Wilson & Piebalga, 2008 ).
There is an open discussion about this point, with arguments
for and against using several contextual variables in VA models
( Timmermans, Doolaard, & de Wolf, 2011; Willms & Raudenbush,
198 9 ). In our case, we have decided to include only the SES, ba-
sically for three reasons. First, it is important to understand that,
in the Chilean context, the most important contextual variable ex-
plaining school performance is SES, as repeatedly demonstrated
in many studies ( Auguste & Valenzuela, 2003; Bellei, 2009; Con-
treras, Sepúlveda, & Bustos, 2010; Gauri, 1999; Hsieh & Urquiola,
2006 ). For instance, Manzi, Strasser, San Martín, and Contreras
(2008) show that a very large and stable percentage of between-
school variance is accounted for by socioeconomic factors; in fact,
over 60 percent of this variance is explained by the combination
of the individual SES and school average SES. This result is con-
sistent with PISA decomposition of variance ( OECD, 2010 ), which
shows that Chile is one of the countries with the largest percent-
age of between-school variance explained by socioeconomic fac-
tors. Manzi et al. (2008) also show that the effect of other variables
is negligible, and that the type of school does not explain a rele-
vant share of between school variance once socioeconomic factors
are controlled for. These results corroborate the extent of socioe-
conomic segregation in the Chilean educational system (see also
Carrasco & San Martín, 2012; Valenzuela, Bellei, & Ríos, 2014 ).
Considering the previous arguments, it is important to under-
stand that, in the case of Chile, the most important contextual vari-
able to analyze school performance is SES both at individual and
school (average) levels, and due to this reason we include these
variables and no other. Second, and from a broader perspective,
models with different types of contextual variables usually yield
similar results ( Timmermans et al., 2011 ), which would include
both basic and advanced value added models ( Harris, 2011 ). How-
ever, interestingly, although different VA models might use differ-
ent contextual variables, there is a wide consensus on the inclu-
sion among them of SES (e.g. see the OECD report on VA models
in different countries). Third, using SES as the only control variable
enables achieving a balance between the amount of information
used and the sample size. It is debatable if the increased amount of
information considered when including more contextual variables
offsets the problems derived from the curse of dimensionality, a
relevant issue to control for in nonparametric models for efficiency
measurement ( Simar and Wilson, 2008 , p. 441, chap. 4). Actually,
in our particular case the number of contextual variables which we
summarize in our composite index is very high (4) and, therefore,
we consider this to be a case in which a composite index may be
particularly convenient.
Despite the great advances that models 1 and 2 represent,
when activity analysis methods are considered to evaluate them,
model 3 in Table 1 best isolates the real impact of the school
on students’ progress, as indicated by many contributions from
the traditional literature on school evaluation—which generally use
parametric multilevel analysis. Therefore, evaluating model 3 con-
sidering activity analysis methods has some unexplored advantages
that will be part of our aims. It should be considered that, up to
now, the Chilean agency for the quality of education has not con-
sidered type 2 or type 3 models for the classification of schools.
2.2. The evaluation methodology
As indicated in the introduction, in recent years there has been
considerable progress in the evaluation methodology of school
performance, especially regarding the development of multilevel
models ( Bryk & Raudenbush, 1992; Goldstein, 1995 ). The general
concept is that students’ academic achievement depends on their
personal characteristics, and the characteristics of the school,
and its context. To analyze these situations, the different levels
are considered as hierarchical systems of students and schools,
with individuals and groups defined in separate hierarchies, using
variables that are defined at each level ( Hox, 2002 ).
This significant progress can solve the main methodological
problem of the pioneering studies in this field, by breaking down
the various nested effects that explain students’ educational out-
comes. The percentage of student achievement due to the different
variables at different organizational levels—district, school, class,
and student—can also be determined.
In this particular area, there are many statistical models for es-
timation which differ in several regards such as the definition and
inclusion of adjustment variables ( Tekw e et al., 2004 ). However,
the most prominent position is to include adjustment variables,
especially when establishing some form of accountability or dis-
semination of the results, since the equity is questionable if the
background characteristics of students and schools are not taken
into account ( McCaffrey et al., 2004; McCaffrey et al., 2003 ).
Despite the many methodological and empirical contributions,
this research is not without its criticisms ( Kupermintz, 2003; McE-
wan, 2003 ). One of them is related to the nature of their estimate
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 5
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
as a comparison with the average, assuming no real incentive for
performance excellence. Indeed, the vast majority of value-added
studies used multilevel regression or analysis.
An alternative is found in the models that consider activity
analysis techniques, mainly using nonparametric frontier methods
(mostly Data Envelopment Analysis, DEA, and its nonconvex coun-
terpart, Free Disposal Hull, FDH). They provide relevant advantages
such as the ability to compare with the optimal or, more impor-
tantly, the possibility to specify several inputs and outputs simulta-
neously. In the field of education many studies have adopted these
techniques (see, for instance Mizala, Romaguera, and Farren, 2002;
Thanassoulis, Kortelainen, Johnes, and Johnes, 2011; Portela, Ca-
manho, and Keshvari, 2013; Johnes and Johnes, 2009 , among oth-
ers). However, these methods are not exempt from general criti-
cisms. On the one hand, regarding the nature of these methods,
their deterministic and probabilistic features, the curse of dimen-
sionality, or their heavy reliance on the absence of outliers have
been a source of continuous concern. On the other, in the specific
field of education, their main disadvantage has been to consider
only student-level data, which would yield estimations that incor-
rectly assume that schools are operating with the optimal endow-
ment of inputs (both, controllable or uncontrollable), without es-
tablishing thus a multilevel analysis.
From the perspective of the non-parametric deterministic tech-
nology, Cazals et al. (2002) pointed out the problem of lack of sta-
tistical properties and the impact that the presence of outliers can
cause. From the perspective of student data, another criticism is
that multilevel analysis avoids the resources schools allocate in or-
der to sustain the education process ( McCaffrey et al., 2004; Mc-
Caffrey et al., 2003 ). Both problems are addressed in this paper, by
providing an integrated approach with the use of metafrontier ap-
proaches ( Battese et al., 2004 ) and the use of robust partial order-
m frontiers ( Cazals et al., 2002 ).
Some previous research initiatives have taken these consid-
erations into account. Specifically, our aims and methods are
consistent with previous literature such as Silva Portela and
Thanassoulis (2001) who, following model 2, decompose the
overall efficiency into two different effects, namely, school effect
and student-within-school effect. Later on, De Witte et al. (2010)
refined this methodology, proposing a robust approach based on
Cazals et al. ’s (2002) ideas for the estimation. A more recent
contribution by Thieme et al. (2013) , based on model 1, consid-
ered both ideas, performing a multilevel decomposition in which
additional variables are factored in so as to provide a more com-
prehensive analysis. However, despite their interest, these previous
studies disregard the existence of contextual factors—at both the
student and school level—in the assessment of school performance.
In contrast, our proposal here is based on the definition of a
contextualized value-added robust multilevel nonparametric fron-
tier assessment that separates the net effects of student and
school, controlling for socioeconomic status, both at the student
and school level, eliminating (or at least drastically reducing) the
potential problems caused by the existence of outliers and dimen-
sionality problems, as will be explained in the following section.
3. Methodology
3.1. The decomposition of overall efficiency
Following the rationale described in the above paragraphs, our
model is inspired by Silva Portela and Thanassoulis ’s (2001) initial
contribution, in which two frontiers are considered, namely, the lo-
cal and the global frontiers. Whereas the former is specific to each
school, and oriented to an estimation of student-within-school ef-
ficiency, the latter is used to estimate student within-all-schools
efficiency. The so-called student’s effect (henceforth STE ), or stu-
dent’s efficiency, will determine the distance to the local frontier.
In contrast, the school’s effect (henceforth SCE ), or school’s effi-
ciency, refers to the distance separating the local and the global
frontiers. Model 2 in Fig. 1 documents the ideas underlying both
effects.
In Fig. 1 , the student (c) achieves an output level represented
by y
c
, corresponding to an input level x
c
—the score achieved by
the student in a previous academic year. When the student’s (c)
academic performance is compared with the local frontier (which
corresponds to the school in which student c is enrolled, i.e. school
d ), one may notice that student c is inefficient. This occurs because
there are more efficient students enrolled in the same school d
who achieve better results ( y
2
) using the same inputs—or previ-
ous knowledge ( x
c
). Therefore, the student’s effect, or what Silva
Portela and Thanassoulis (2001) refer to as the student-within-
school’s efficiency , is determined as the ratio of the potential to the
actual output, i.e. ST E
2
= y
2
/y
c
. This student’s effect is higher than
unity when the student is inefficient (as in the case presented in
Fig. 1 ), and equal to unity otherwise. The efficiency coefficient for
the student under analysis will be OE
2
= y

2
/y
c when compared to
the overall frontier, or the student-within-all-schools’ efficiency in
the terms used by Silva Portela and Thanassoulis (2001) . Having
these two reference frontiers, the school’s effect ( SCE
2
, a sort of
technology-gap ratio separating the school-specific frontier from
the overall frontier) is determined by comparing the overall and
local frontiers ( SCE
2
= y

2
/y
2
= OE
2
/ST E
2
).
In summary, the proposal of Silva Portela and Thanas-
soulis (2001) decomposes the global efficiency into two effects,
namely:
Overall efficiency (OE
2
) = Student’s effect (ST E
2
)
×School’s effect (SCE
2
) (1)
According to the taxonomy proposed in Table 1 , this decompo-
sition corresponds to model 2, or pure value added . As mentioned
above, we partially follow this proposal as we are interested in
the student as the unit of analysis. However, in contrast to Silva
Portela and Thanassoulis (2001) , our aim is to develop a multilevel
decomposition of contextual value added ( CVA ). This implies consid-
ering not only students’ academic results, but also contextual fac-
tors regarding student and schools. To that end, we follow a pre-
vious proposal ( Thieme et al., 2013 ), classifiable as a type 1 model
(considering only the contextual status but not the value added;
see Table 1 ), which introduces successive decompositions after the
consideration of specific variables. The final effect is the modifica-
tion of the school effect, after introducing contextual variables on
the average socio-economic level of the parents of students attend-
ing the same school.
In Fig. 1 we illustrate the differences between model 2 (con-
sisting of the assessment of the pure value added ), and our pro-
posal, which gives rise to model 3 (a contextual value added assess-
ment). Regarding the student’s effect, we see that y
2
> y
3
because
in model 3 we consider as inputs not only the previous scores but
also the socio-economic and cultural level of the student’s family.
This would imply that, in order to estimate y
3
, we consider the
student’s family context, whereas y
2
implicitly assumes that this
context does not interfere with students’ scores. In other words,
benchmark y
3
comes from a student that, having the same pre-
vious scores and comparable socio-economic situation, achieves a
better academic outcome. In contrast, benchmark y
2
is that cor-
responding to another student with a better socio-economic situa-
tion. The contextual variables also have an impact on the school ef-
fect. Indeed, it is also clear that y

2
> y

3
, since the contextual socio-
economic environment could also affect the student’s achievement
in y

3
.
Summing up, when comparing models 2 and 3 we see that part
of what is considered as student’s inefficiency ( STE
2
), according to
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
6 C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
Fig. 1. Decomposition of the pure Value Added (model 2) and Contextual Value Added (model 3).
model 3 is attributable to both the effect of the contextual vari-
ables ( STCE
3
) and the net student’s effect ( STE
3
). Analogously, the
school’s effect from model 2 ( SCE
2
) can be decomposed in order
to account for the impact of the context due to socio-economic
factors ( SCCE
3
) and the net effect of the school ( SCE
3
). This im-
plies that a potential technology gap (represented by both STCE
3
and SCCE
3
) appears when the context has a significant impact on
the scores that students can achieve. In model 3 this gap may, or
may not, be significant whereas in model 2, by definition, the im-
pact is nonexistent. To conclude, the decomposition corresponding
to model 3 can be expressed as:
Overall efficiency = Contextual effect on efficiency
×Net Overall efficiency
= Contextual effect on efficiency
×Student’s effect ×School’s effect (2)
or:
OE = (y

2
/y
c
) = (y

2
/y

3
) ×(y
2
/y
3
) ×(y
3
/y
c
) ×(y

3
/y
2
) (3)
3.2. Using partial frontiers
When estimating inefficiency levels, the first decision to be
made is the specification of the technology, which has relevant im-
plications as different technologies could lead to different results.
Many previous applications have considered DEA models, implying
that a convex technology is assumed—i.e. each inefficient student
will be compared to her more efficient peers, or combinations of
them. In contrast, FDH models require comparison with an existing
student, and linear combinations are not allowed as a benchmark—
i.e. the convexity assumption is dropped.
Both DEA and FDH have some shortcomings, among which we
may highlight the so-called curse of dimensionality and the poten-
tial impact of outliers. Some studies have established the statisti-
cal properties of the FDH estimator ( Kneip, Park, & Simar, 1998;
Simar & Wilson, 20 0 0 ), indicating that the dimensionality prob-
lems of the FDH models originate from their slow convergence
rates. However, their statistical properties are very appealing, since
they are consistent estimators for any monotone boundary—i.e.
by imposing only strong disposability. In addition, as shown by
Please cite this article as: C. Thieme et al., Value added, educational accountability approaches and their effects on schools’ rankings:
Evidence from Chile, European Journal of Operational Research (2016), http://dx.doi.org/10.1016/j.ejor.2016.01.023
C. Thieme et al. / European Journal of Operational Research 0 0 0 (2016) 1–16 7
ARTICLE IN PRESS
JID: EOR [m5G; March 27, 2016;14:8 ]
Park, Simar, and Weiner (20 0 0) , FDH has additional advantages
over convex models, since the latter causes a specification error
when the true technology is nonconvex.
3
In our study we will assume nonconvex technologies which, in
our particular setting will imply that an existing student will be
compared with another existing albeit more efficient student. How-
ever, FDH approaches have also some limitations and, therefore, we
will consider a partial frontier approach such as order- m ( Cazals
et al., 2002 ), which is much more robust to both outliers and the
curse of dimensionality.
4
Accordingly, we assume there is information available on the
input and output vectors ( x
c
= (x
c, 1
, x
c, 2
, . . . , x
c,i
, . . . , x
c,I
) and y
c
=
(y
c, 1
, y
c, 2
, . . . , y
c,j
, . . . , y
c,J
) , respectively) for each student in the
sample ( 1 , 2 , . . . , C). We will then characterize the elements of the
integer activity vector as λ=(λ1
, λ2
, . . . , λC
) and the efficiency co-
efficient as αF DH
c
.
Then, the output-oriented FDH efficiency scores will be yielded
by solving the following linear programming problem:
max
{ αFDH
c
,λ1
,λ2
, ... ,λC
}
αF DH
c
,
s.t.
C
s =1
λs
x
s,i
x
c,i
0 , i = 1 , . . . , I,
C
s =1
λs
y
s,j
+ αF DH
c
y
c,j
0 , j = 1 , . . . , J,
C
s =1
λs
= 1 ,
λs
{ 0 , 1 } , s = 1 , . . . , S.
(4)
Integer problem (4) identifies, for each student c to be FDH-
efficient, another student in the sample with better performance—
i.e. the student with coefficient λs
= 1 . Then it estimates the out-
put increase ( 1 αF DH
c
) which is needed to reach the nonconvex
frontier, αF DH
c
> 1 . Therefore, by solving integer problem (4) , we
will have an activity vector λc
= 1 as well as an efficiency coeffi-
cient αF DH
c
= 1 for FDH-efficient students.
As indicated earlier, there are contributions in the literature
that provide methods to overcome the curse of dimensionality and
the effect of outliers inherent to FDH. Among them, the order- m
estimator ( Cazals et al., 2002; Simar, 2003 ) has become one of the
most popular methods to get round these issues while at the same
time maintaining the advantages of a nonconvex and nonparamet-
ric methodology.
According to this method, we will first consider a positive fixed
integer, m . For a given level of input ( x
c , i
) and output ( y
c , j
), the
order- m estimation defines the expected value of maximum of m
random variables ( y
1 ,j
, ... , y
m,j
), drawn from the conditional dis-
tribution of the output matrix Y for which y
m , j
> y
c , j
. Formally,
the proposed algorithm to compute the order- m estimator has four
steps:
1. For a given level of y
c , j
, draw a random sample of size m with
replacement among those y
m , j
, such that y
m , j
y
c , j
.
2. Compute program (4) and estimate
αc
.
3. Repeat steps 1 and 2 B times and obtain B efficiency coeffi-
cients
αb
c
(b = 1 , 2 , . . . , B ) . The quality of the approximation can
3 Some authors such as Thanassoulis and Silva Portela (2002) have proposed
other methods to overcom e the issue of outliers, by identifying and eliminating
the extreme (super-efficient) cases; however, this is a controversial approach, since
these units can convey relevant information. In the particular context of educa-
tion, eliminating super-efficient observations
could lead to an increase of overall
efficiency—magnifying mediocrity, and reducing potential efficiency gains that could
be achieved.
4 In a recent paper, Krüger (2012) ranked the order- m estimation method as dom-
inated, in general conditions, by the stochastic frontier, DEA and FDH methods. So,
it appears that its use should be restricted to those cases characterized by the sig-
nificant presence of outliers. This is precisely the
case of education, where some
students with very limited resources may achieve a brilliant academic curriculum.
be tuned by increasing B (in most applications B = 200 seems
to be a reasonable choice, but we decided to set B = 20 0 0 ).
4. Compute the empirical mean of B samples as:
αm
c
=
B
b=1
αb
c
B
(5)
The number of observations considered in the estimation ap-
proaches the observed units that meet the condition y
m , j
> y
c , j
as m increases, whereas the expected order- m estimator in each
of the b iterations
αb
c
tends to the FDH efficiency coefficient
αF DH
c
.
Therefore, m is an arbitrary positive integer value, but it is always
convenient to observe the fluctuations of the
αb
c
coefficients that
will ultimately depend on the level of m . αm
c
will normally take
values higher than the unity for acceptable values of m (this indi-
cates that these units are inefficient, as outputs can be increased
without modifying the inputs allocated). When αm
c
< 1 , the unit
c may be labeled as super-efficient, provided the order- m frontier
shows lower output levels than the unit under analysis.
In order to carry out the multilevel estimation, we adapt the
metafrontier approaches proposed by Battese and Rao (2002) ,
Battese et al. (2004) and O’Donnell, Prasada Rao, and Battese
(2008) . In the case of Model 3, this process has the following steps:
(a) Classify students (1 , 2 , . . . , C) depending on the school they are
enrolled in (1 , 2 , . . . , D ) .
(b) Complete steps 1 to 4 of the order-m algorithm to estimate the
efficiency coefficients corresponding to each student in the spe-
cific school she/he is enrolled in ( αm
c
), and with regards to the
overall frontier (meaning, considering the school frontier point
represented by y
2
in in Fig. 1 and the efficiency corresponding
to the overall frontier y

2
).
(c) After completing process b, add new input variables (the socio-
economic and cultural level corresponding to the student’s fam-
ily and the average of the same variable for the school) and
apply again steps 1 to 4 of the order- m estimation to the com-
plete sample to estimate the efficiency coefficients with respect
to the metafrontier ( y

3
) and with the local frontier y
3
. These
new coefficients provide an assessment of the student’s effi-
ciency with respect to the overall metafrontier, taking into ac-
count only schools operating with no better environmental fac-
tors than the school where the student is enrolled (precisely
what is represented by point y

3