ArticlePDF Available

Direct vs indirect management training in schools: Experimental evidence from Mexico

Authors:

Abstract

We use a large-scale randomized experiment (across 1,198 public primary schools in Mexico) to study the impact of providing schools directly with high-quality managerial training by professional trainers vis-à-vis through a cascade-style “train the trainer” model. The training focused on improving principals’ capacities to collect and use data to monitor students’ basic numeracy and literacy skills and to provide feedback to teachers on their instruction and pedagogical practices. After two years, the direct training improved schools’ managerial capacity by 0 .13σ (p-value 0.018 ) (relative to “train the trainer” schools), but had no meaningful impact on student test scores (we can rule out an effect greater than 0.08σ at the 95% level).
Direct vs Indirect Management Training in Schools:
Experimental Evidence from Mexico*
Mauricio RomeroJuan BedoyaMonica Yanez-Pagans§
Marcela Silveyra§Rafael de Hoyos§¶
October 12, 2021
Abstract
We use a large-scale randomized experiment (across 1,198 public primary schools in Mex-
ico) to study the impact of providing schools directly with high-quality managerial training by
professional trainers vis-`
a-vis through a cascade-style “train the trainer” model. The training
focused on improving principals’ capacities to collect and use data to monitor students’ basic
numeracy and literacy skills and to provide feedback to teachers on their instruction and peda-
gogical practices. After two years, the direct training improved schools’ managerial capacity by
0.13σ(p-value 0.018) (relative to “train the trainer” schools), but had no meaningful impact on
student test scores (we can rule out an effect greater than 0.08σat the 95% level).
Keywords: School management
JEL Codes: I20, I25, H52, M10, O15
*Corresponding author: Mauricio Romero (mtromero@itam.mx). This study was possible thanks to the support of
the Secterar´
ıa de Educaci´
on P ´
ublica (SEP) of Mexico. We are especially indebted to Pedro Velasco, Griselda Olmos,
Lorenzo Baladr´
on, Germ´
an Cervantes, Javier Trevi ˜
no, and all the staff at SEP’s Directorate of Education Management.
We are especially grateful to Raissa Ebner, Renata Lemos, and Daniela Scur for their collaboration in this project’s
early stages and subsequent discussions. The design and analysis benefited from comments and suggestions from
Mitch Downey, Enrique Seira, and Renata Lemos. Some of the results in this paper first appeared in the working paper
“School Management, Grants, and Test Scores: Experimental Evidence from Mexico” (http://hdl.handle.net/10986/
35108) and a draft of this paper was previously circulated under the title “The Effect of Improving School Management
on Test Scores: Experimental Evidence from Mexico”. Karina G ´
omez provided excellent research assistance. The
views expressed here are those of the authors alone and do not necessarily reflect the World Bank’s opinions. Romero
gratefully acknowledges financial support from the Asociaci´
on Mexicana de Cultura, A.C. All errors are our own.
ITAM
Universidad de Cantabria
§The World Bank
Xaber
1 Introduction
Schools are complex organizations that are often poorly managed. Across developed
and developing countries, they tend to have worse management practices than hospi-
tals and manufacturing firms (Bloom, Lemos, Sadun, Scur, & Van Reenen,2014;Bloom,
Lemos, Sadun, & Van Reenen,2015). This is not surprising; school principals are cho-
sen according to seniority in many countries. As a result, although they have years of
classroom experience, principals may lack management skills.
We study the implementation of the Government of Mexico’s large-scale Escuela al
Centro (in English, school at the center) strategy, designed to strengthen school auton-
omy and improve principals’ managerial capacity. This strategy was implemented na-
tionwide for three consecutive school years (2015–16, 2016–17, and 2017–18). A core
component was managerial training for principals that focused on collecting and using
data to monitor students’ basic numeracy and literacy skills and providing feedback to
teachers on their instruction and pedagogical practices. We randomly assigned 1,198
eligible public primary schools to one of two groups: (1) a “train the trainer” group,
which received managerial training under a cascade model in which 10% of school su-
pervisors were trained by professional trainers, who then trained other supervisors, who
in turn provided training to principals (n=599) and (2) a “direct training” group, in
which principals received managerial training directly from a team of professional train-
ers (n=599).1
We collected data on schools’ managerial practices, using the Development World
Management Survey (DWMS) (Lemos & Scur,2016), at baseline (in late 2015) and two
years after the program was implemented (in early 2018). The DWMS measures different
dimensions of schools’ managerial practices, including operations management, people
management, target setting, and monitoring. To measure students’ learning, we use data
from a nationwide standardized test (PLANEA).2
Our results show a significant improvement of 0.13 (p-value 0.018) standard devia-
tions (σthereafter) in managerial capacities among “direct training” schools compared to
“train the trainer” schools. The improvements in managerial capacities do not translate
into meaningful impacts on student learning. Students in “direct training” schools have
1Supervisors are the direct link between schools and educational authorities in each state. Supervisors
are typically in charge of 8 to 20 schools (Santiago, McGregor, Nusche, Ravela, & Toledo,2012).
2Plan Nacional para la Evaluaci´on de los Aprendizajes (PLANEA) was designed by the Mexican Educa-
tion Evaluation Institute, which measures Math and Spanish learning outcomes in grades 6, 9, and 12.
PLANEA is aligned with the national curriculum and applied to a sample of students in all Mexican
schools. In schools with fewer than 40 students in the grade assessed, every student is tested. In those
with more than 40 students, a random sample is tested.
1
test scores that are 0.03σhigher than their counterparts in “train the trainer” schools.
However, this difference is not statistically significant (p-value 0.24) and we can rule out
an effect greater than 0.08σon test scores at the 95% level. There is little evidence of
heterogeneity in treatment effects by baseline school characteristics.
The failure of “direct training” to significantly improve learning outcomes could be
related to the weak contemporary correlation between managerial practices and test
scores in Mexico (as measured by the DWMS). Our baseline data shows that a 1σim-
provement in managerial practices is associated with an increase of less than 0.1σin
test scores, a weaker correlation than Bloom, Lemos, et al. (2015) reported for several
countries. However, even assuming a stronger link between management and test scores
(an increase of 0.4σin test scores as the management index increases by one standard
deviation) based on the results from Bloom, Lemos, et al. (2015) would imply that an
increase of 0.13σin management practices should yield an increase in test scores of
0.029σ—the actual treatment effect was 0.03σ.3Overall, the expected treatment effects
on learning outcomes (assuming previous correlational evidence is causal and given the
treatment effects on management practices) are of the same order of magnitude as the
actual treatment effects. While the intervention improved management practices, these
improvements did not generate statistically significant (even with a sample size of 1,198
schools) changes in learning outcomes. The fact we do not find treatment effects on test
scores is not due to a lack of power. Our ex-post minimum detectable effect (MDE) is
0.081σfor test scores (with power of 80% and size of 5%) (Ioannidis, Stanley, & Doucou-
liagos,2017;McKenzie & Ozier,2019). Rather, this result likely implies the need for
larger effects on management practices to find economically meaningful effects on test
scores.4
One way to boost the intervention’s impact on management practices would be to
increase principals’ attendance to the training workshops. While “direct training” prin-
cipals were about ten percentage points more likely to complete courses or receive coun-
seling on how to carry out school director duties in the past, less than 25% completed
the entire training (80 hours), and roughly 10% completed less than 20 hours of the
training. Instrumental variable approaches suggest boosting attendance to the training
workshops would result in further improvements in management that would translate
3Alternately, using our own data—and under some strong assumptions that allow us to use the treat-
ment as an instrument for DWMS scores—our treatment effect on DWMS implies an expected increase of
.065σin test scores, given the treatment effect on DWMS scores.
4Alternatively, it could be the case that schools in Mexico are so well managed that the returns to
additional increases in management are relatively low. However, comparing the distribution of DWMS
scores in our setting to those in other countries found by Bloom, Lemos, et al. (2015) suggests this is not
the case.
2
into meaningful impacts in student learning outcomes. However, we take these results
as suggestive evidence that requires further confirmation in future studies due to mea-
surement errors in the attendance data.
We contribute to the literature and policy debate on improving school management in
low- and middle-income countries. Our study advances research that explores the rela-
tionship between school management and student outcomes (World Bank,2007). Recent
evidence, mostly from developed countries, demonstrates that management practices
are an important determinant of school effectiveness. Using data for 39 charter schools
in the United States, Dobbie and Fryer (2013) show that traditional school inputs such
as class size and teaching certifications cannot explain differences in school effective-
ness. However, school management practices, such as providing feedback to teachers
and using data to guide instruction, are a significant determinant of school effectiveness
(Fryer,2014). In line with Fryer (2014)’s findings, Bloom, Lemos, et al. (2015) document
a positive and statistically significant correlation between managerial practices and stu-
dent learning outcomes. There is also evidence from India that learning outcomes and
progress are positively correlated with managerial practices (Lemos, Muralidharan, &
Scur,2021). Our baseline data adds to the evidence base on the correlation between
school management and learning outcomes. We find a weaker correlation between them
than previous studies have identified, which could be partially explained by the low
autonomy in the Mexican public education system — Bloom, Lemos, et al. (2015) shows
higher school autonomy is correlated with higher management scores. 5
Moreover, we provide experimental estimates of the relative effectiveness of two
strategies to improve school principals’ managerial capacity on management practices
and student learning outcomes in a developing country. While there is evidence from
the US that training programs to improve school principals’ managerial practices have
a positive effect on student learning outcomes (Fryer,2017), our evidence and find-
ings from other developing countries suggest otherwise. A closely related paper by
Muralidharan and Singh (2020) shows that an attempt to improve management quality
in Indian schools by inducing principals to adopt “best practices” had no impact on
student outcomes. India’s accountability and incentive structure for principals is rather
weak (as it is in Mexico), which the authors argue may explain why improving manage-
rial practices has little or no effect on test scores.6
5Mexican schools are less autonomous than schools in other Organization for Economic Co-operation
and Development (OECD) countries (Hopkins, Ahtaridou, Matthews, Posner, & Figueroa,2007;OECD,
2016).
6A second potential explanation for the lack of impact is that managerial practices take longer to
improve student education outcomes (see de Hoyos, Ganimian, and Holland (2020)).
3
2 Context and intervention
2.1 Context
Mexico’s primary education system (grades 1 to 6) has more than 14 million students
and 573,000 teachers distributed across roughly 100,000 schools.7The system is highly
decentralized: 32 state-level education systems follow a common national curriculum
and general guidelines from the Federal Secretariat of Public Education (Federal SEP,
from its acronym in Spanish). However, local governments are fully responsible for
administering each state-level Secretariat of Public Education.
Access to primary education in Mexico is high, with over 98% of children aged 6 to 12
enrolled in the education system (World Bank,2017b;Direcci´
on General de Planeaci´
on,
Programaci´
on y Estad´
ıstica Educativa,2018). However, the quality of education is low.
Although almost all children graduate from primary school (World Bank,2017a), fewer
than half of them achieve basic proficiency in math and Spanish (and only one in three in
marginalized areas) according to 2018 nationwide standardized tests (Instituto Nacional
para la Evaluaci´
on de la Educaci´
on,2018).
Mexico has three types of public primary schools: general primary schools (which
teach most children), and indigenous and community schools, which serve roughly
800,000 and 400,000 students, respectively. These include many small, multi-grade
schools with small numbers of students.8The existence of a large number of small
schools increases the governance challenges and requires tailored management models.
These governance challenges are compounded by a high rotation of teachers and
school principals and—until recently—the lack of a system to regulate the entry and
promotion of teachers. Previously, the national teachers’ union influenced teachers’
(and school principals’) appointments ( ´
Alvarez, Garc´
ıa-Moreno, & Patrinos,2007). In
2013, the central government implemented a major education reform that defined and
regulated a merit-based process to hire and promote teachers and principals. It also
introduced the Escuela al Centro strategy to enhance principals’ managerial capacities to
improve students’ learning outcomes.
7Unlike other countries in Latin America, Mexico has a small private education sector that accounts for
only 10% of the total primary enrollment (Elacqua, Iribarren, & Santos,2018).
8The smallest 40% of primary schools in the country serve 8.5% of its primary school students. By
comparison, Mexico has less than half of the student population of the United States, but 50% more
schools.
4
2.2 The Escuela al Centro strategy
The government implemented the Escuela al Centro strategy nationwide for three con-
secutive school years (2015–16, 2016–17, and 2017–18). It had two main components–the
provision of school grants and school principals’ managerial training.9
The grant component consisted of an annual monetary transfer to schools that sub-
mitted an improvement plan approved by their school council. The grants ranged from
USD 1,500–15,000 depending on the school’s size (about USD 5–50 per student). Schools
used these grants to implement their annual improvement plans and pay for basic sup-
plies and repairs. As explained in Section 3.1, all schools in our sample received these
grants.
The training component focused on improving school principals’ capacity to collect
and use data to monitor students’ basic numeracy and literacy skills and provide teach-
ers with feedback on their teaching styles. To implement this training, the Federal SEP
developed two tools: (i) a student assessment to monitor foundational skills (Sistema de
Alerta Temprana en Escuelas de Educaci´on B´asica, SisAT) and (ii) a Stallings classroom ob-
servation tool to provide feedback to teachers on how to improve their instructional and
pedagogical practices.
The SisAT was developed based on evidence that providing school principals in Mex-
ico with information on what areas of the national curriculum are the most challenging
for students, based on national standardized learning assessments, had positive effects
on student learning (de Hoyos, Garc´
ıa-Moreno, & Patrinos,2017;de Hoyos, Ganimian,
& Holland,2019). It includes items from past national standardized assessments to
measure students’ basic numeracy and literacy skills and identify lagging students to
trigger early remedial actions. Teachers administer the SisAT and input the scores into a
simple software program that generates a detailed report and flags students with signif-
icant learning gaps. The SisAT also pinpoints the most challenging areas of the national
curriculum for students and classrooms. While schools were free to decide when to ad-
minister the SisAT, most did so at the beginning of the school year to generate baseline
measures to include in their school improvement plans and throughout the school year
to monitor students’ progress.
The Stallings classroom observation tool was developed based on evidence that us-
ing school principals to coach teachers improves student learning in Mexico (Secretar´
ıa
de Educaci´
on P ´
ublica & Banco Internacional de Reconstrucci´
on y Fomento,2015). It
9The description of the Escuela al Centro strategy is available at: http://www.dof.gob.mx/nota
detalle popup.php?codigo=5488338, and the operating rules are available at: http://www.dof.gob.mx/
nota detalle.php?codigo=5509544&fecha=29/12/2017.
5
collects information on the teacher’s use of time in the classroom, including the activi-
ties conducted, pedagogical practices, use of educational materials, and level of student
engagement (Stallings,1977;Stallings & Molhlman,1988). The tool helps school princi-
pals systematically collect data to provide feedback to teachers on how to improve their
teaching.
The Federal SEP developed a high-quality training strategy, including learning mate-
rials, to help principals use the SisAT and the Stallings classroom observation tool. The
training consisted of 40 hours of instruction per tool.10 The SEP used a “train the trainer
cascade model to roll out the Escuela al Centro strategy throughout the country. State-
level education authorities selected 10% of all primary school supervisors to receive the
training from a professional team that included staff involved in designing the tools. The
trained supervisors then provided training to the other supervisors in their state. After
all supervisors in a state were trained (by either the professionals or their peers), they
then proceeded to train the school principals in their jurisdictions. To test the efficacy of
the cascade model versus professional training, the SEP provided professional training
to some school principals.
3 Research design and data
3.1 Sampling and randomization
To test the effectiveness of the professional training, the SEP invited all 32 states in the
country to participate in an impact evaluation. The seven states that met the requirements—
Durango, Estado de M´
exico, Morelos, Tlaxcala, Guanajuato, Tabasco, and Puebla—were
selected to be part of this research study (see Figure A.1).11
The local education authorities invited all public primary schools in all seven states
to apply for the school grant component of Escuela al Centro. We randomly assigned
the 1,198 schools that applied to the grants to one of two groups: (1) “train the trainer
schools, which received a school grant and school principals’ managerial training using
the cascade model (n=599) or (2) “direct training” schools, which received a school
grant and professional training (n=599).12
10These training materials are available at the Escuela al Centro website: https://escuelaalcentro.com/
intervenciones/descarga-los-materiales/.
11From the 32 states in Mexico, 14 states expressed interest in participating in the impact evaluation.
However, only seven complied with the required paperwork.
12Some principals in “direct training” schools also benefited from short-term leadership certificate train-
ing programs offered by state-level education authorities. These programs focused on leadership issues, in
line with the national school principal’s profile standards. As explained in more detail in Section 3.2, the
6
Our experimental design allows us to estimate the causal effects of using professional
trainers vs. the cascade model to train school principals.13
Our sample included public primary schools that chose to participate in the pro-
gram. To be eligible, schools had to have more than 60 students; those with at least one
classroom with students from different grades were excluded.14 Therefore, the schools
included in the experiment have more students and teachers and are more likely to be
urban than the average public primary school in Mexico (see Table A.1).
The randomization protocol varied slightly across the seven participating states. Broadly,
schools were first stratified into different groups (by enrollment and location) and then
randomly assigned to either the treatment (“direct training” by professional trainers)
or control (cascade-style training) group. Section A.3 details each state’s sampling and
randomization strategy.
3.2 Data
We collected primary data on the principals’ managerial practices and perceptions of
the quality of the training they received. We also use secondary data from adminis-
trative records provided by SEP that include: (i) student learning outcomes; (ii) school
marginalization index; and (iii) information on schools’ infrastructure, enrollment rates,
and number of teachers. Our study period coincides with two school years, 2015–16
(baseline) and 2017–18 (follow-up). In addition, the baseline and follow-up months
roughly coincide with the nationwide standardized test application dates, which allow
us to measure the intervention’s impact on both management practices and student test
scores.
3.2.1 Primary data
Information on schools’ managerial practices was collected using the DWMS—an adap-
tation of the World Management Survey (WMS), originally developed to measure the
quality of management practices in manufacturing firms in developed (Bloom & Van Reenen,
DWMS—the instrument we use to measure principals’ overall managerial practices—does not take lead-
ership practices into account. Appendix A.5 provides further details on the states’ short-term certification
programs.
13While it is not possible to experimentally identify the impact of the cascade-style training vis- `
a-vis
no training at all, there is evidence that cascade training models tend to be relatively ineffective (Popova,
Evans, Breeding, & Arancibia,2018).
14Small schools were excluded because the managerial intervention was focused on training principals
to coach teachers. In small schools, principals also teach and thus need different management models.
7
2007) and developing countries (Bloom, Eifert, Mahajan, McKenzie, & Roberts,2013).15
The WMS and the DWMS were subsequently adapted to measure management quality
in the education and health sectors (Bloom, Lemos, et al.,2015;Bloom, Propper, Seiler, &
Van Reenen,2015). The WMS and DWMS are fully comparable; the latter can better iden-
tify granular differences in management practices at the lower end of the management
quality distribution, where most public schools and hospitals in developing countries
are located.
The DWMS adaptation to measure management practices in schools in developing
countries consists of a recorded interview with the school principal. The interview in-
cludes 23 open-ended questions that collect information on four dimensions: operations
management, people management, target setting, and monitoring.16 The interviews,
conducted by a team of two trained enumerators (one coder and one interviewer), lasted
around two hours. While the DWMS is designed to be less subjective than the WMS to
overcome the lower capacity of enumerators in developing countries, there is still con-
siderable room for enumerator subjectivity in data coding. We assigned the same team
of trained enumerators to code the audio files from all the original interviews to ensure
comparability over time. Unfortunately, 32% of the audio files from the baseline, and
16% from the follow-up, were damaged when we asked the enumerators to code the
interviews. Schools with and without misplaced audio files in the endline are statisti-
cally indistinguishable in observable characteristics (see Tables A.2 and A.3). Thus, our
results are unlikely to be driven by differences in observable or unobservable character-
istics between schools with and without functioning audio files, including the treatment
status. To ensure comparability across schools, we randomly assigned audio files to enu-
merators and control for enumerator fixed effects in all the regressions. We conducted
the baseline DWMS surveys between October 2015 and May 2016, and the follow-up
surveys from January to May 2018.17
For reference, we compare the distribution of management scores in our setting (at
baseline) to the distribution in India, Brazil, and the US from Bloom, Lemos, et al. (2015)
— see Figure A.2. Overall, the average school in our setting has a higher management
score than the average school in India (2.1 vs 1.7), a similar score to the average school
15For more on the DWMS survey instrument, see Lemos and Scur (2016) and https://
developingmanagement.org/
16The DWMS adaptation for Mexico included an additional dimension, leadership. Having this addi-
tional dimension responded to the government’s need to better align the DWMS instrument to the rules
of operation of Escuela al Centro. All the analyses reported in this paper exclude the leadership dimension
when constructing the overall DMWS index to ensure it is comparable with other settings.
17https://escuelaalcentro.com/ has a detailed timeline of when different rounds of data collection
took place in each state.
8
in Brazil (2.1 vs 2.0) and a lower score than the average school in the US (2.1 vs 2.7).
However, the dispersion in management practices in our setting is lower, which could be
explained by the restrictions imposed on the experimental sample (e.g., excluding small
multi-grade public schools and all private schools).
School principals also completed two online surveys to assess the quality of manage-
rial training—one for each tool. The surveys included questions about different elements
of the tools and their associated training. Since the surveys were not mandatory, many
school principals did not complete them. Schools that answered the online surveys are
statistically different from those that did not in several observable characteristics, includ-
ing the treatment status (see Tables A.9A.12). For completeness, we report some basic
statistics from these two online surveys. However, their information is not representa-
tive of our experimental sample due to sample selection (i.e., it has differential attrition
across treatments and within each treatment); therefore, we exclude this data from our
main analysis.
3.2.2 Secondary data
We use three types of secondary data. First, we measure student learning outcomes
using PLANEA test scores. The exam was administered to grade 6 in June 2015 and
June 2018. SEP gave the authors access to anonymized student-level data for both years
for all schools in our sample. As part of registering their school for PLANEA, principals
need to fill a survey (PLANEA-Contexto). The survey asks about their daily activities
and the challenges they face. We use these surveys as a secondary measure of principals’
management practices and their exposure to the training.
Second, we gathered information on the location of each school from the PLANEA
data. We used this information to match each school to its locality’s marginalization
index, which accounts for deficiencies in education, housing, population, and household
income.18 Third, we use administrative school census data collected by federal and
state-level education authorities known as Formato 911. Since 1998, Formato 911 has been
collected at the beginning and end of each school year. It gathers basic information
on the number of students, the number of teachers and their qualifications, the school
principal’s characteristics, the number of classrooms, and its geographic location. This
school census data can be matched with the PLANEA data.19
18Consejo Nacional de Poblaci´on (CONAPO) estimates this index.
19All the data used in this paper can be downloaded from www.xaber.org.mx.
9
3.3 Balance and attrition
Most student and school characteristics are balanced across treatment arms at baseline
(see Table 1). The average school in our sample has 279 students, 9.4 teachers, and a
pupil–teacher ratio of 29; 40% of schools are in rural areas and 38% are in areas catego-
rized as poor or very poor by the government. The last two rows of the table show the
fraction of schools for which we have endline DWMS and PLANEA data (in 2018). We
have PLANEA data for nearly all schools (99%) and DWMS data for 77% of schools
(due to damaged audio files from the interviews, as mentioned above). The proportion
of schools with both PLANEA and DWMS data is balanced across treatments.
10
Table 1: Balance across treatment groups
(1) (2) (3)
Mean (SD) Difference
Train the trainer Direct training (2)-(1)
Students in math achievement L-IV (%) 7.79 8.36 0.56
(11.11) (12.25) (0.66)
Students in math achievement L-I (%) 60.00 60.17 0.19
(21.81) (22.24) (1.22)
Students in language achievement L-IV (%) 2.67 3.31 0.65∗∗
(3.86) (6.40) (0.30)
Students in language achievement L-I (%) 52.17 51.56 -0.60
(20.25) (20.52) (1.15)
Marginalization 0.38 0.38 -0.00
(0.49) (0.49) (0.02)
Urbanization 0.41 0.39 -0.02
(0.49) (0.49) (0.02)
Number of students 272.59 285.96 13.31
(163.74) (163.69) (8.87)
Number of teachers 9.27 9.63 0.36
(4.23) (4.39) (0.24)
Student-teacher ratio 28.34 28.89 0.54
(6.92) (7.18) (0.35)
DWMS endline missing 0.22 0.23 0.01
(0.41) (0.42) (0.02)
PLANEA endline missing 0.01 0.01 0.00
(0.08) (0.09) (0.00)
Observations 599 599 1,197
This table presents the means and standard deviations (in parentheses) for “train the trainer” (Column
1) and “direct training” schools (Column 2). The differences reported in Column 3 take into account the
randomization design (i.e., including strata fixed effects), and standard errors (in parentheses) are clustered
at the school level. Achievement level (L) refers to the PLANEA 2015 exam results, which are scored from
L-I (lowest) to L-IV (highest). Marginalization is a variable coded 1 for areas with “high” or “very high”
marginalization, and 0 otherwise according to CONAPO. Urbanization is a variable coded 1 for schools
located in an urban area, and 0 otherwise. The number of students and teachers is taken from Formato 911
for the 2015–2016 academic year. p<0.10, p<0.05, ∗∗∗ p<0.01
3.4 Compliance
To measure compliance with the evaluation’s original design, we compiled information
on whether school principals reported attending the training sessions on the two tools.
As mentioned above, since the characteristics of schools that answered the survey are dif-
11
ferent from those that did not (see Tables A.9A.12), these results should be interpreted
with caution. Due to the sample selection in the compliance measures and the inability
to directly compare the training hours across treatment arms (cascade vs. direct), local
average treatment effect estimates using the treatment assignment as an instrument for
the number of training hours principals report are difficult to interpret and likely biased.
While virtually no principals in “train the trainer” schools completed the full train-
ing on the use of either tool, less than half (40%) received some training (10–39 hours)
through the cascade model (see Columns 1 and 2 of Table 2). About one-quarter of prin-
cipals in “direct training” schools (20–25%) completed the training on both tools, and
roughly 80% received some training from professionals. The difference between treat-
ment groups is statistically significant for both completed training and the indicator for
some training. This is further supported by evidence from surveys principals completed
as part of the nationwide student standardized test (PLANEA-Contexto surveys) in 2018.
Specifically, “direct training” principals were more likely to complete courses or receive
counseling on how to carry out school director duties in the past 12 months (see Panel
A, Table A.4).
12
Table 2: Compliance across treatment groups
(1) (2) (3)
Mean (SD) Difference
Train the trainer Direct training (2)-(1)
Panel A: Stallings classroom observation tool
All training sessions (40 hours) 0.01 0.24 0.23∗∗∗
(0.10) (0.43) (0.02)
Some training sessions (10-40 hours) 0.39 0.86 0.44∗∗∗
(0.49) (0.35) (0.03)
Observations 304 533 837
Panel B: Foundational skills measurement tool (SisAT)
All training sessions (40 hours) 0.01 0.19 0.18∗∗∗
(0.09) (0.39) (0.02)
Some training sessions (10-40 hours) 0.32 0.72 0.39∗∗∗
(0.47) (0.45) (0.03)
Observations 402 464 866
This table presents the means and standard deviations (in parentheses) for “train the trainer” (Col-
umn 1) and “direct training” schools (Column 2). The differences reported in Column 3 take into
account the randomization design (i.e., including strata fixed effects), and standard errors (in paren-
theses) are clustered at the school level. Panel A has information on whether the school principal
attended the training sessions for the Stallings classroom observation tool (and how many hours).
Panel B has information on whether the school principal attended the training sessions on SisAT
(and how many hours). p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
4 Results
4.1 Correlation between management (DWMS) and learning
We first explore the correlation between learning outcomes and DWMS at baseline. We
seek to replicate the analysis in Bloom, Lemos, et al. (2015) and compare our results with
those previously found in the literature on the magnitude of the relationship between
student learning outcomes and school management measured by the DWMS.
In our data, better management quality, as measured by the DWMS, is only marginally
correlated with better educational outcomes (see Table 3). A one-standard-deviation in-
crease in the DWMS index is associated with an increase of 0.00σ–0.02σin student test
scores. We follow Bloom, Lemos, et al. (2015) and control for the number of pupils in
the school, the pupil-teacher ratio, and the marginalization index (Column 4). We also
control for measurement error by adding interviewer fixed effects (Column 5). The point
13
estimate is robust to various controls and is never statistically significant. By compari-
son, Bloom, Lemos, et al. (2015) find that a one-standard-deviation increase in the WMS
index is associated with an increase in pupil outcomes of 0.2–0.4σ. In Brazil, the setting
included in their study closest to Mexico, a one-standard-deviation increase in the WMS
index is associated with an increase in pupil outcomes of 0.104σ. Thus overall, we find
a lower correlation between outcomes and management than previously documented in
other countries.
Of the four components of the DWMS (operations, monitoring, targets, and people),
targets was the most closely correlated with student outcomes, followed by monitoring
and people; none of them demonstrated a statistically significant correlation with test
scores in our setting (see Table A.5).
Table 3: Association between DWMS and test scores at baseline (all schools in the sam-
ple)
(1) (2) (3) (4) (5)
PLANEA 2015 scores
DWMS 0.0017 0.011 0.020 0.017 -0.0065
(0.025) (0.025) (0.023) (0.022) (0.027)
No. of obs. 20,680 20,680 20,680 20,049 20,049
State FE No Yes Yes Yes Yes
Strata FE No No Yes Yes Yes
Controls No No No Yes Yes
Enumerator FE No No No No Yes
This table presents the conditional correlation between the DWMS and
student test scores at baseline across all schools in our sample. State
FE indicates whether state fixed effects are included. Strata FE indicates
whether strata fixed effects are included. Controls indicates whether the
regression controls for the number of pupils in the school, the pupil–
teacher ratio, and the marginalization index. Enumerator FE indicates
whether interviewer dummies are included. Standard errors are clustered
at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
4.2 Experimental results
Our main estimating equation for student-level outcomes is:
Yisg =αg+γ1DirectTrainings+εisg (1)
14
where Yisg is the outcome of interest of student iin school sin group g(denoting the
stratification group used to assign treatment), αgare strata fixed effects, DirectTrainings
indicates whether school sreceived training directly provided by professional trainers,
and εisg is an error term. We use a similar specification without isubscript to examine
school-level outcomes. We estimate these models using ordinary least squares, cluster-
ing standard errors at the school level. γ1is the coefficient of interest and reflects the
difference between the two types of training.
Table 4: Effects on the DWMS and on learning outcomes
Panel A: DWMS and its components
(1) (2) (3) (4) (5) (6)
DWMS Operations Monitoring Targets People Leadership
Direct training 0.13∗∗ 0.14 0.13∗∗ 0.027 0.093-0.0091
(0.053) (0.056) (0.060) (0.052) (0.056) (0.060)
No. of obs. 913 913 913 913 913 911
Panel B: Learning outcomes
(1) (2) (3) (4)
Math Language Average PCA
Direct training 0.031 0.027 0.035 0.035
(0.029) (0.027) (0.029) (0.029)
No. of obs. 39,263 39,665 37,958 37,958
Panel A presents the treatment effects on management practices (measured using the DWMS). The
outcome in Column 1 is the composite index of management practices, while Columns 2–5 display
the outcomes for individual components of the management index. Finally, Column 6 has the addi-
tional dimension, leadership; the SEP asked for this dimension to be measure in addition to the four
traditional components of the DWMS. The overall DMWS index used in Column 1 excludes the lead-
ership dimension to ensure comprability with other settings. Panel B presents the treatment effects on
learning outcomes (measured using PLANEA scores). The outcomes are math test scores (Column 1),
language test scores (Column 2), the average across subjects (Column 3), and a composite index across
subjects (Column 4). All regressions account for the randomization design (i.e., they include strata
fixed effects). Panel A regressions also include enumerator fixed effects. Standard errors are clustered
at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
Overall, the direct training intervention improved management practices relative to
the indirect training (see Panel A, Table 4). Management scores in schools that received
direct training were 0.13σ(p-value 0.018) higher than in “train the trainer” schools.
Therefore, our results show that it pays off to invest in professional trainers to improve
15
school principals’ management capacities.20
Given the nature of the intervention (direct vs. indirect training on the Stallings and
the SisAT tools) is not surprising that the “Operations” and “Monitoring” dimensions
improve the most. “Operations” partially measures whether there is data-driven plan-
ning, as well as personalization of instruction and learning — goals the Stallings and the
SisAT specifically help with. Likewise, “Monitoring” partially measures whether school
performance is measured frequently and appropriately ( SisAT does this for students,
and Stallings does it for teachers). Given the limitations principals face to dismiss or
promote teachers, it is not surprising that the treatment effect on “People/talent man-
agement” is lower. However, measuring teachers’ performance (via Stallings) enables
principals to provide soft incentives (e.g., better teaching assignments or non-pecuniary
rewards).21
While management practices improved as a result of the direct training intervention,
test scores did not (see Panel B, Table 4). Students in “direct training” schools scored
0.03σ(p-value 0.24) higher than those in “train the trainer” schools. We can rule out,
at the 95% confidence level, the possibility that test scores increased by more than 0.09σ
with respect to “train the trainer” schools. This result is robust to a series of student-
and school-level controls (see Table A.6). Including controls allows us to rule out an
effect greater than 0.08σat the 95% level. Finally, there is no evidence that the “direct
training” affected other outcomes such as grade repetition or enrollment rates (see Table
A.8).
20According to surveys administered to principals as part of the nationwide student standardized test
(PLANEA-Contexto surveys), in 2018 “direct training” principals were not more likely than those trained
using the cascade method to undertake activities to improve learning outcomes, observe classroom teach-
ing, help teachers improve their pedagogical practices, or provide parents with school and student perfor-
mance information (see Panel B, Table A.4). However, these self-reported measures are likely inflated by
social desirability bias given the (likely unrealistic) high proportion of principals who report doing these
activities often or very often. Thus, we do not believe the difference between the “train the trainer” and
“direct training” from these self-reported measures accurately reflects treatment effects.
21We further explore whether it is reasonable to expect that providing training on two tools would im-
prove managerial practices in Section A.2. We address this question by looking at the correlation between
the self-reported information on the use of the Stallings classroom observation and SisAT tools on both
DWMS. We find that “direct training” schools are more likely to use the management tools provided to
them, and the use of these tools is correlated with the DWMS. However, since schools that answered
these surveys are statistically different from those that did not in several observable characteristics, in-
cluding treatment status (see Tables A.9A.12), these correlations may be biased and are presented for
completeness.
16
4.3 Discussion: The lack of effect of direct training on test scores
As mentioned above, Bloom, Lemos, et al. (2015) find that a one-standard-deviation
increase in the WMS index is associated with an increase in pupil outcomes of 0.2σ
0.4σ. The evidence from our baseline shows a weaker correlation between management
practices and test scores. Thus, optimistically assuming that a one-standard-deviation
increase in management practices generates a treatment effect of 0.4σon student learn-
ing, an increase of 0.13σin management practices should yield an increase in test scores
of 0.029σ—the actual treatment effect was 0.03σ.
We also estimate the effect of an increase in the DWMS index on test scores using
the treatment assignment to instrument for the DWMS index. While this requires a
strong assumption that the DWMS completely captures any possible effect of the treat-
ment on test scores, it provides a different benchmark of the plausible causal effect of
improvements in management practices on test scores. The instrumental variable ap-
proach suggests increasing the DWMS by one standard deviation increases test scores
by 0.49σ(see Table A.7). This implies an expected increase of .065σin test scores, given
the treatment effect on DWMS scores.
Further, the components of the DWMS index that Bloom, Lemos, et al. (2015) find
are more associated with test scores, are the ones where the direct training intervention
improved management practices the least relative to the indirect training (see Columns
2-4 in Panel A of Table 4). Specifically, the treatment effect on the two components that
have the highest association with learning outcomes (“people/latent management” and
“target setting”) are the lowest.22
Overall, the expected treatment effects on learning outcomes (given the treatment
effects on management practices) are of the same order of magnitude as the actual treat-
ment effects. While the direct training intervention improved management practices
relative to the indirect training, these improvements did not generate statistically signif-
icant changes in learning outcomes (even with a sample size of 1,198 schools). However,
we cannot rule out the possibility that management had a small positive impact on learn-
ing.
Given the low overall attendance rate to the training workshops (see Section 3.4),
we explore whether increasing participation in the training workshops would result in
22Bloom, Lemos, et al. (2015) find that of the four components of the DWMS, “people/latent manage-
ment” had the highest association with test scores (an increase of one standard deviation in “people/latent
management” score was associated with an increase of 0.257 standard deviation in pupil test scores), fol-
lowed by “target setting” (associated with an increase of 0.158 standard deviation in pupil test scores),
“monitoring” (associated with an increase of 0.133 standard deviation in pupil test scores) and “opera-
tions” (associated with an increase of 0.093 standard deviation in pupil test scores).
17
further improvements in management practices and larger learning gains. To answer
this question we use an instrumental variable approach to study the effects of attending
more training workshops. Specifically, we instrument attendance to training workshops
with whether a school was randomly assigned to “direct training”.
However, we face a trade-off between two different approaches to measure workshop
attendance. We could use PLANEA-Contexto surveys, which all principals answered,
but that do not ask about training workshops from our program specifically, but rather
about any courses or counseling on how to carry out school director duties in the past.
On the other hand, using the online surveys to measure (self-reported) attendance to the
training workshops in this program will likely induce sample selection bias since the
characteristics of schools that answered the survey are different from those that did not.
We report both. While neither approach is perfect, both suggest similar results.
Using the PLANEA-Contexto surveys suggests that attending any courses or counsel-
ing on how to carry out school director duties increases both management practices and
learning outcomes (see Panel B, Table 5). The local average treatment effects (LATE) here
represent the effects of attending any workshops, not just those related to our program,
for the compilers who are more likely to attend a workshop due to the “direct training”
treatment. While attending any courses or counseling on how to carry out school di-
rector duties is likely to capture a significant portion of the effect of “direct training”,
it is unlikely to be the only channel through which the treatment affects outcomes — a
necessary condition for the LATE to be valid.
Using the online surveys suggests that attending the training workshops from this
program increases both management practices and learning outcomes (see Panel A, Ta-
ble 5). However, the local average treatment effects (LATE) are likely biased due to
sample selection caused by the differential attrition in the survey. In addition, and as
mentioned above, the training hours across the treatment arms (cascade vs. direct) are
not directly comparable.
Overall, while both approaches have limitations, they suggest one way to boost the
intervention’s impact on management practices and learning outcomes would be to in-
crease principals’ attendance to the training workshops.
18
Table 5: Effects of principal’s attendance to the training workshops
(1) (2) (3) (4)
DWMS DWMS PLANEA PLANEA
Panel A: Online surveys
Attended >10 hrs of training 0.36∗∗∗ 0.15
(0.11) (0.070)
Attended all trainings 0.69∗∗ 0.28
(0.21) (0.13)
N. of obs. 808 808 28,906 28,906
F test (first stage) 292 143 240 138
Panel B: PLANEA - Contexto
Ever 1.1.68
(.55) (.39)
Past 12 months 1∗∗ .56
(.5) (.31)
N. of obs. 850 850 29,731 29,731
F test (first stage) 16 26 13 30
Panel A presents the effect of a principal attending at least 10 hours of training on the
DWMS score (Columns 1) and the overall PLANEA score (Column 3), as well as the
effect of a principal attending all training on the DWMS score (Columns 2) and the overall
PLANEA score (Column 4). Attendance (in both cases) is instrumented with the treatment
allocation. The F statistic of the first stage is presented in the bottom row (see Table 2for
details on the first stage). Columns 1–2 use data at the school level, while Columns 3–4
use data at the student level. Attendance is measured using online surveys which have
differential attrition across treatments (see Tables A.9A.12). Panel B presents the effect
of a principal ever attending a training workshop (on any topic related to his or her
duties) on the DWMS score (Columns 1) and the overall PLANEA score (Column 3), as
well as the effect of a principal attending a training workshop (on any topic related to
his or her duties) in the past 12 months on the DWMS score (Columns 2) and the overall
PLANEA score (Column 4). Attendance (in both cases) is instrumented with the treatment
allocation. The F statistic of the first stage is presented in the bottom row (see Table A.4
for details on the first stage). Columns 1–2 use data at the school level, while Columns 3–4
use data at the student level. Attendance is measured using PLANEA-Contexto surveys
which do not have differential attrition across treatments (see Table 1). All regressions
account for the randomization design (i.e., they include strata fixed effects) and include
enumerator fixed effects. Standard errors are clustered at the school level. p<0.10, ∗∗
p<0.05, ∗∗∗ p<0.01
4.4 Heterogeneity
This section explores heterogeneous treatment effects on management practices by schools’
(and principals’) baseline characteristics. Overall, there is little evidence of heterogeneity.
Specifically, we estimate the following equation:
19
Yisg =αg+β1treatments+β2treatments×cs+β3cs+εisg (2)
where csdenotes the school characteristics of which we wish to measure heterogene-
ity, and β2allows us to test whether there is any differential treatment effect. Everything
else is as in Equation 1. We study heterogeneity in schools’ baseline management quality,
marginalization index, and principals’ gender and tenure. Overall, we find no evidence
of heterogeneity in management practices (DWMS) or learning outcomes (see Tables
A.13 and A.14).
We also study whether there is heterogeneity by whether there was a change in the
school’s principal between 2015 and 2018. We first assess that the treatment did not have
an impact on principal turnover itself (see Table A.15), but note that sim43% of schools
change principals at some point in those three years. While high principal turnover may
be a barrier to improving learning outcomes (Miller,2013;Bartanen, Grissom, & Rogers,
2019), there is no heterogeneity in treatment effects on management practices or learning
outcomes by teacher turnover (see Table A.16).
5 Conclusions
Recent studies have identified the pivotal role that managerial practices play in helping
an organization achieve its objectives (Bender, Bloom, Card, Van Reenen, & Wolter,2018),
and the education sector is no exception. This paper reports some of the first experimen-
tal evidence of the relative effectiveness of two interventions to improve school manage-
ment in a developing country. We randomly assigned a group of public primary schools
in seven Mexican states to receive training either directly from professional trainers or
a “train the trainer” cascade model. Compared to indirect training, direct training im-
proved school principals’ managerial capacity but failed to improve learning outcomes
significantly. To improve student learning in the short term, a management intervention
may need to have a greater impact on school principals’ managerial capacities.
However, given the cost of the “direct training” intervention (470 USD per school,
see Appendix A.4), the marginal dollar in Mexico might be better spent on interven-
tions that focus on improving pedagogy (e.g., teaching at the right level, teacher content
and pedagogical training) and improving teacher accountability (Kremer, Brannen, &
Glennerster,2013;Glewwe & Muralidharan,2016;Snilstveit et al.,2016).
20
References
´
Alvarez, J., Garc´
ıa-Moreno, V., & Patrinos, H. A. (2007). Institutional effects as determinants
of learning outcomes: Exploring state variations in Mexico (Vol. 4286). World Bank
Publications.
Bartanen, B., Grissom, J. A., & Rogers, L. K. (2019). The impacts of principal turnover.
Educational Evaluation and Policy Analysis,41(3), 350-374. Retrieved from https://
doi.org/10.3102/0162373719855044 doi: 10.3102/0162373719855044
Bender, S., Bloom, N., Card, D., Van Reenen, J., & Wolter, S. (2018). Management prac-
tices, workforce selection, and productivity. Journal of Labor Economics,36(S1), S371-
S409. Retrieved from https://doi.org/10.1086/694107 doi: 10.1086/694107
Bloom, N., Eifert, B., Mahajan, A., McKenzie, D., & Roberts, J. (2013). Does management
matter? Evidence from India. The Quarterly Journal of Economics,128(1), 1–51.
Bloom, N., Lemos, R., Sadun, R., Scur, D., & Van Reenen, J. (2014). The new empirical
economics of management. Journal of the European Economic Association,12(4), 835–
876.
Bloom, N., Lemos, R., Sadun, R., & Van Reenen, J. (2015). Does management matter in
schools? The Economic Journal,125(584), 647–674.
Bloom, N., Propper, C., Seiler, S., & Van Reenen, J. (2015). The impact of competition on
management quality: Evidence from public hospitals. Review of Economic Studies,
82(2), 457-489.
Bloom, N., & Van Reenen, J. (2007, 11). Measuring and explaining management practices
across firms and countries. The Quarterly Journal of Economics,122(4), 1351-1408. doi:
10.1162/qjec.2007.122.4.1351
Bruns, B., & Luque, J. (2014). Great teachers: How to raise student learning in Latin America
and the Caribbean. World Bank Publications.
de Hoyos, R., Ganimian, A. J., & Holland, P. A. (2019, 11). Teaching with the test:
Experimental evidence on diagnostic feedback and capacity building for public
schools in Argentina. The World Bank Economic Review.
de Hoyos, R., Ganimian, A. J., & Holland, P. A. (2020). Great things come to those
who wait: Experimental evidence on performance-management tools and training
in public schools in Argentina.
de Hoyos, R., Garc´
ıa-Moreno, V., & Patrinos, H. A. (2017). The impact of an account-
ability intervention with diagnostic feedback: Evidence from Mexico. Economics of
Education Review,58, 123 - 140.
Direcci´
on General de Planeaci´
on, Programaci´
on y Estad´
ıstica Educativa. (2018). Sistema
21
educativo de los Estados Unidos Mexicanos, principales cifras 2017-2018 (Tech. Rep.).
Secretar´
ıa de Educaci´
on P ´
ublica. Retrieved from https://www.planeacion.sep
.gob.mx/estadisticaeindicadores.aspx
Dobbie, W., & Fryer, R. (2013, October). Getting beneath the veil of effective schools:
Evidence from New York City. American Economic Journal: Applied Economics,5(4),
28-60. doi: 10.1257/app.5.4.28
Elacqua, G., Iribarren, M. L., & Santos, H. (2018). Private schooling in latin america:
Trends and public policies (Tech. Rep.). Inter-American Development Bank. doi:
http://dx.doi.org/10.18235/0001394
Fryer, R. (2014). Injecting charter school best practices into traditional public schools:
Evidence from field experiments. The Quarterly Journal of Economics,129(3), 1355–
1407.
Fryer, R. (2017, May). Management and student achievement: Evidence from a randomized
field experiment (Working Paper No. 23437). National Bureau of Economic Research.
Retrieved from http://www.nber.org/papers/w23437 doi: 10.3386/w23437
Glewwe, P., & Muralidharan, K. (2016). Chapter 10 - improving education outcomes
in developing countries: Evidence, knowledge gaps, and policy implications. In
S. M. Eric A. Hanushek & L. Woessmann (Eds.), (Vol. 5, p. 653 - 743). Elsevier.
Hopkins, D., Ahtaridou, E., Matthews, P., Posner, C., & Figueroa, D. (2007). Reflections
on the performance of the Mexican education system. OCDE. Directorate for Edu-
cation, disponible en www. sep. gob. mx/work/models/sep1/Resource/93128/5/Mex PISA-
OCDE2006. pdf .
INEGI. (2018). Marco geoestad´ıstico de Mexico. Retrieved 06/01/2018, from https://
www.inegi.org.mx/temas/mg/default.html#
Instituto Nacional para la Evaluaci´
on de la Educaci´
on. (2018). Resultados de PLANEA
(Tech. Rep.). Retrieved from https://www.inee.edu.mx/evaluaciones/planea/
resultados-planea/
Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The power of bias in
economics research. The Economic Journal,127(605), F236-F265. Retrieved from
https://onlinelibrary.wiley.com/doi/abs/10.1111/ecoj.12461 doi: https://
doi.org/10.1111/ecoj.12461
Kremer, M., Brannen, C., & Glennerster, R. (2013). The challenge of education and
learning in the developing world. Science,340(6130), 297–300.
Lemos, R., Muralidharan, K., & Scur, D. (2021, January). Personnel management and school
productivity: Evidence from India (Working Paper No. 28336). National Bureau of
Economic Research. Retrieved from http://www.nber.org/papers/w28336 doi:
22
10.3386/w28336
Lemos, R., & Scur, D. (2016). Developing management: An expanded evaluation tool for
developing countries. RISE Working Paper, 16(007).
McKenzie, D., & Ozier, O. (2019). Why ex-post power using estimated effect sizes is bad,
but an ex-post mde is not. World Bank Development Impact Blog.
Miller, A. (2013). Principal turnover and student achievement. Economics of Education
Review,36, 60–72.
Muralidharan, K., & Singh, A. (2020, November). Improving public sector management at
scale? experimental evidence on school governance India (Working Paper No. 28129).
National Bureau of Economic Research. Retrieved from http://www.nber.org/
papers/w28129 doi: 10.3386/w28129
OECD. (2016). PISA 2015 results (volume II): Policies and practices for successful schools.
OECD Publishing.
Popova, A., Evans, D. K., Breeding, M. E., & Arancibia, V. (2018). Teacher professional
development around the world: The gap between evidence and practice (Tech. Rep.). The
World Bank.
Santiago, P., McGregor, I., Nusche, D., Ravela, P., & Toledo, D. (2012). Oecd reviews of
evaluation and assessment in education: Mexico 2012. OECD Publishing. doi: http://
dx.doi.org/10.1787/9789264172647-en
Secretar´
ıa de Educaci´
on P ´
ublica, & Banco Internacional de Reconstrucci´
on y
Fomento. (2015). Evaluaci´on de impacto del ejercicio y desarrollo de
la autonom´ıa de gesti´on escolar y estrategia de intervenci´on controlada (Tech.
Rep.). Retrieved from http://escuelaalcentro.com/wp-content/uploads/2018/
04/Documento-base-Evaluaci%C3%B3n-de-Impacto.pdf
Snilstveit, B., Stevenson, J., Menon, R., Phillips, D., Gallagher, E., Geleen, M., . . . Jimenez,
E. (2016). The impact of education programmes on learning and school participa-
tion in low-and middle-income countries.
Stallings, J. (1977). Learning to look: a handbook on classroom observation and teaching models.
Wadsworth Pub. Co. Retrieved from https://books.google.com.mx/books?id=
QEglAQAAIAAJ
Stallings, J., & Molhlman, G. (1988). Classroom observation techniques. In J. Keeves
(Ed.), Educational research, methodology and measurement: An international handbook.
Elsevier Science & Technology Books.
World Bank. (2007). What is school-based management? (Tech. Rep. No. 44922). Retrieved
from http://documents.worldbank.org/curated/en/113901468140944134/
What-is-school-based-management
23
World Bank. (2017a). Primary completion rate, total (% of relevant age group). (data retrieved
from World Development Indicators, https://data.worldbank.org/indicator/SE
.PRM.CMPT.ZS?locations=MX)
World Bank. (2017b). School enrollment, primary (% net). (data retrieved from
World Development Indicators, https://data.worldbank.org/indicator/SE.PRM
.NENR?locations=MX)
24
A Online Appendix for “School management, grants, and
test scores: Experimental Evidence from Mexico” by
Bedoya, de Hoyos, Romero, Silveyra and Yanez-Pagans
A.1 Additional tables and figures
Figure A.1: States participating in the impact evaluation
Note: Geographical information on the administrative areas of Mexico comes from INEGI (2018). Figure A.3 provides
the distribution of schools within each state.
A.1
Figure A.2: Distribution of management practices in Brazil, India, Mexico, and the US
0
.2
.4
.6
.8
1
1.2
1.4
1.6
Density
1 2 3 4
Management score
Brazil India
US Mexico
Note: Distribution of management practices from Brazil, India, and the US is based on the replication data of Bloom,
Lemos, et al. (2015). The distribution of management practices from Mexico comes from our baseline data collected
in 2015.
A.2
Table A.1: Balance between schools in the experimental sample and other schools
(1) (2) (3)
Mean Difference
Variable Participant Non-participant (1)-(2)
Students in math achievement L-IV (%) 8.08 10.95 -2.87∗∗∗
(11.70) (17.79) (0.36)
Students in math achievement L-I (%) 60.07 54.96 5.11∗∗∗
(22.01) (28.04) (0.67)
Students in language achievement L-IV (%) 2.99 4.93 -1.94∗∗∗
(5.28) (10.68) (0.17)
Students in language achievement L-I (%) 51.83 47.63 4.21∗∗∗
(20.38) (28.02) (0.62)
Marginalization 0.58 0.53 0.05∗∗∗
(0.49) (0.50) (0.01)
Urbanization 0.40 0.37 0.03∗∗
(0.49) (0.48) (0.01)
Number of students 279.55 196.79 82.75∗∗∗
(163.88) (199.19) (4.94)
Number of teachers 9.45 6.56 2.89∗∗∗
(4.31) (5.72) (0.13)
Student-teacher ratio 28.63 29.42 -0.79∗∗∗
(7.05) (11.09) (0.22)
Observations 1,194 20,611 21,805
This table presents the mean and standard error of the mean (in parentheses) for schools not in the exper-
iment (Column 1) and those in the experiment (Column 2). Column 3 shows the mean difference between
participant and non-participant schools, as well as the standard error of the difference, clustered at the school
level. Achievement level (L) refers to PLANEA exam results, which are scored from L-I (lowest) to L-IV (high-
est). Marginalization is a variable coded 1 for areas that have a “high” or “very high” marginalization, and
0 otherwise according to CONAPO. Urbanization is coded 1 for schools located in an urban area, and 0
otherwise. The number of students and teachers is taken from Formato 911 from the year 2015. p<0.10, ∗∗
p<0.05, ∗∗∗ p<0.01
A.3
Table A.2: Balance between schools with and without a DWMS audio file
(1) (2) (3)
Mean Difference
Variable No DWMS Audio DWMS Audio (1)-(2)
Direct training 0.50 0.52 -0.02
(0.50) (0.50) (0.04)
Students in math achievement L-IV (%) 8.00 8.38 -0.61
(11.78) (11.45) (0.78)
Students in math achievement L-I (%) 60.17 59.73 0.93
(21.73) (22.98) (1.49)
Students in language achievement L-IV (%) 2.95 3.10 -0.25
(5.27) (5.33) (0.37)
Students in language achievement L-I (%) 52.04 51.11 1.72
(20.03) (21.59) (1.43)
Marginalization 0.58 0.55 0.02
(0.49) (0.50) (0.03)
Urbanization 0.38 0.47 -0.06
(0.49) (0.50) (0.03)
Number of students 280.56 276.03 3.45
(161.77) (171.27) (11.03)
Number of teachers 9.43 9.55 0.08
(4.27) (4.48) (0.30)
Student-teacher ratio 28.90 27.69 0.41
(6.98) (7.22) (0.43)
Observations 267 927 1,194
This table presents the mean and standard error of the mean (in parentheses) for schools without audio for the
DWMS endline interview (Column 1) and schools with it (Column 2). Column 3 shows the mean difference between
both types of schools, as well as the standard error of the difference, clustered at the school level. Achievement
level (L) refers to PLANEA exam results, which are scored from L-I (lowest) to L-IV (highest). Marginalization
is a variable coded 1 for areas that have a “high” or “very high” marginalization, and 0 otherwise according to
CONAPO. Urbanization is a variable coded 1 for schools located in an urban area and 0 otherwise. The number of
students and teachers is taken from Formato 911 for the year 2015. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.4
Table A.3: Differences in the likelihood of a DWMS audio file at the endline by school
characteristics
(1) (2) (3)
DWMS Audio
Direct training -0.014 -0.045
(0.024) (0.11)
Students in math achievement L-IV (%) -0.00076 0.00011
(0.0012) (0.0016)
Students in language achievement L-IV (%) -0.00052 0.0022
(0.0028) (0.0047)
Marginalization -0.0040 0.018
(0.031) (0.043)
Urbanization -0.073∗∗ -0.11∗∗
(0.033) (0.045)
Student-teacher ratio 0.0031 0.0022
(0.0021) (0.0029)
Direct training ×Students in math achievement L-IV (%) -0.0017
(0.0023)
Direct training ×Students in language achievement L-IV (%) -0.0030
(0.0059)
Direct training ×Marginalization -0.044
(0.055)
Direct training ×Urbanization 0.064
(0.059)
Direct training ×Student-teacher ratio 0.0017
(0.0037)
No. of obs. 1,194 1,193 1,193
This table presents the effect of different school characteristics on the likelihood the audio for the endline was
usable. Achievement level (L) refers to PLANEA exam results, which are scored from L-I (lowest) to L-IV
(highest). Marginalization is a variable coded 1 for areas that have a “high” or “very high” marginalization, and
0 otherwise according to CONAPO. Urbanization is a variable coded 1 for schools located in an urban area and
0 otherwise. The number of students and teachers is taken from Formato 911 for the year 2015. All regressions
take into account the randomization design (i.e., include strata fixed effects). Standard errors are clustered at
the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.5
Table A.4: Principal self-reported information from 2018 PLANEA-Contexto surveys
(1) (2) (3)
Mean (SD) Difference
Train the trainer Direct training (2)-(1)
Panel A: Courses or counseling on how to carry out school director duties
Ever 0.74 0.85 0.11∗∗∗
(0.44) (0.36) (0.02)
Past 12 months 0.85 0.94 0.08∗∗∗
(0.35) (0.24) (0.02)
Panel B: Actions taken often or very often during the past 12 months
Activities to improve learning 0.79 0.77 -0.02
(0.41) (0.42) (0.03)
Classroom observations 0.71 0.72 0.01
(0.45) (0.45) (0.03)
Help teachers improve pedagogical practices 0.79 0.81 0.03
(0.41) (0.39) (0.02)
Provide parents with performance information 0.92 0.93 0.01
(0.27) (0.26) (0.02)
Observations 565 545 1,110
This table presents the means and standard deviations (in parentheses) for “train the trainer” (Column 1) and
“direct training” schools (Column 2). Column 3 presents the differences between groups, taking into account
the randomization design (i.e., including strata fixed effects); standard errors (in parentheses) are clustered at
the school level. Data come from PLANEA-Contexto questionnaires completed by school principals. Panel
A includes self-reported information about courses or counseling on carrying out school director duties ever
taken or in the past 12 months. Panel B indicates how often the principal engages in different practices (often
and very often are coded 1, while sometimes and never are coded 0). “Activities to improve learning” is the
teacher’s self-reported frequency of taking any action to improve learning or the curriculum, including classroom
observations, teacher evaluations, student evaluations, and acting as a tutor for teachers to improve pedagogical
practices. “Classroom observations” is the self-reported frequency of such observations. “Help teachers improve
pedagogical practice” is the self-reported frequency with which the principal helps the teacher improve their
pedagogical practices. “Provide parents with performance information” is the self-reported frequency with
which the teacher provides parents with school and student performance information. p<0.10, ∗∗ p<0.05,
∗∗∗ p<0.01
A.6
Table A.5: Association between DWMS components and test scores at baseline across all
schools
(1) (2) (3) (4)
PLANEA 2015 scores
Operations -0.020
(0.025)
Monitoring 0.0061
(0.024)
Targets 0.00039
(0.026)
People -0.0087
(0.028)
No. of obs. 20,049 20,049 20,049 20,049
This table presents the conditional correlation between
DWMS components and student test scores at baseline
across all schools. All regressions control for strata fixed ef-
fects, the number of pupils in the school, the pupil–teacher
ratio, the marginalization index, and enumerator fixed ef-
fects. Standard errors are clustered at the school level.
p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.7
Table A.6: Effects on learning outcomes
(1) (2) (3) (4)
PCA score
Direct training 0.035 0.032 0.032 0.035
(0.029) (0.025) (0.025) (0.024)
No. of obs. 37,958 37,958 37,958 37,958
Lagged scores No Yes Yes Yes
Student controls No No Yes Yes
School controls No No No Yes
This table presents the treatment effects on learning outcomes
(measured using PLANEA scores). The outcome is a compos-
ite index across subjects. All regressions take into account the
randomization design (i.e., include strata fixed effects). “Lagged
scores” indicates whether school average test scores from 2015
are included as controls. “Student controls” indicates whether
age and gender are included as controls. “School controls” indi-
cates whether the following controls are included: whether the
school has a day shift, whether a primary school is intended to
serve an indigenous population, the school’s age, whether the
school is located in an urban area, and the marginalization index
of the school’s municipality. Standard errors are clustered at the
school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
Table A.7: Effect of DWMS on learning outcomes: Instrumental variable approach
(1) (2) (3) (4)
Math Language Average PCA
DWMS 0.39 0.45 0.49 0.49
(0.30) (0.31) (0.34) (0.34)
No. of obs. 30,956 31,270 29,926 29,926
This table presents the effects of increasing the DWMS on
learning outcomes (measured using PLANEA scores). We in-
strument the DWMS with the treatment assignment. The un-
derlying assumption is that the DWMS completely captures any
effect the treatment assignment might have on test scores. The
first stage is presented in Panel A, Table 4. The outcomes are
math test scores (Column 1), language test scores (Column 2),
the average across subjects (Column 3), and a composite in-
dex across subjects (Column 4). All regressions take into ac-
count the randomization design (i.e., include strata fixed ef-
fects). Standard errors are clustered at the school level.
p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.8
Table A.8: Effects on other outcomes
(1) (2) (3)
Pass rate Repetition rate Enrollment
Direct training 0.0308 0.0251 13.65
(0.253) (0.114) (8.734)
No. of obs. 1,186 1,185 1,192
Control mean 99.22 0.72 258.49
This table presents the treatment effects on the percentage of students
who successfully complete their grade and can progress to the next
one (pass rate in Column 1), the percentage of students that repeat a
grade (Column 2), and the total number of students enrolled (Column
3). All outcomes refer to the 2017–2018 school year. All regressions
take into account the randomization design (i.e., include strata fixed
effects). Standard errors are clustered at the school level. p<0.10,
∗∗ p<0.05, ∗∗∗ p<0.01
A.9
Table A.9: Balance between schools who answered the Stallings’ implementation survey
and those that did not
(1) (2) (3)
Mean Difference
Variable Answered No survey
survey answer (1)-(2)
Direct training 0.64 0.17 0.51∗∗∗
(0.48) (0.38) (0.03)
Students in math achievement L-IV (%) 8.41 7.30 1.85∗∗
(12.14) (10.56) (0.80)
Students in math achievement L-I (%) 60.07 60.06 -1.58
(21.98) (22.10) (1.45)
Students in language achievement L-IV (%) 3.12 2.68 0.75∗∗
(5.69) (4.13) (0.36)
Students in language achievement L-I (%) 51.34 53.03 -3.32
(19.95) (21.37) (1.36)
Marginalization 0.59 0.56 0.01
(0.49) (0.50) (0.03)
Urbanization 0.40 0.42 -0.02
(0.49) (0.49) (0.03)
Number of students 282.89 271.45 17.25
(164.83) (161.50) (9.82)
Number of teachers 9.58 9.14 0.45
(4.35) (4.20) (0.26)
Student-teacher ratio 28.60 28.71 0.50
(7.13) (6.87) (0.39)
Observations 845 349 1,194
This table presents the mean and standard error of the mean (in parentheses) for schools taking the
Stallings implementation and use survey (Column 1) and schools not taking it (Column 2). Column 3
shows the mean difference between participant and non-participant schools, as well as the standard
error of the difference, clustered at the school level. Achievement level (L) refers to PLANEA exam
scores, which range from L-I (lowest) to L-IV (highest). Marginalization is a variable coded 1 for
areas that have a “high” or “very high” marginalization, and 0 otherwise according to CONAPO.
Urbanization is a variable coded 1 for schools located in an urban area and 0 otherwise. The number
of students and teachers is taken from Formato 911 for the year 2015. p<0.10, ∗∗ p<0.05, ∗∗∗
p<0.01
A.10
Table A.10: Likelihood of answering the Stallings’ implementation survey by school
characteristics
(1) (2) (3)
Answered Stallings’ survey
Direct training 0.39∗∗∗ 0.12
(0.023) (0.11)
Students in math achievement L-IV (%) 0.0019 0.0015
(0.0014) (0.0019)
Students in language achievement L-IV (%) 0.0032 -0.0038
(0.0025) (0.0055)
Marginalization 0.012 0.029
(0.032) (0.046)
Urbanization -0.026 -0.017
(0.032) (0.048)
Student-teacher ratio 0.0030 -0.0030
(0.0021) (0.0029)
Direct training ×Students in math achievement L-IV (%) 0.0017
(0.0025)
Direct training ×Students in language achievement L-IV (%) 0.0054
(0.0059)
Direct training ×Marginalization -0.026
(0.052)
Direct training ×Urbanization 0.028
(0.055)
Direct training ×Student-teacher ratio 0.0084∗∗
(0.0034)
No. of obs. 1,194 1,193 1,193
This table presents the effect of different school characteristics on the likelihood of taking the Stallings’ imple-
mentation survey by schools characteristics. Achievement level (L) refers to PLANEA exam results, which are
scored from L-I (lowest) to L-IV (highest). Marginalization is a variable coded 1 for areas that have a “high”
or “very high” marginalization, and 0 otherwise according to CONAPO. Urbanization is a variable coded 1 for
schools located in an urban area and 0 otherwise. The number of students and teachers is taken from Formato 911
for the year 2015. All regressions take into account the randomization design (i.e., include strata fixed effects).
Standard errors are clustered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.11
Table A.11: Balance between schools who answered the SisAT’s implementation survey
and those that did not
(1) (2) (3)
Mean Difference
Variable Answered No survey
survey answer (1)-(2)
Direct training 0.53 0.41 0.13∗∗∗
(0.50) (0.49) (0.03)
Students in math achievement L-IV (%) 8.30 7.46 -0.35
(11.67) (11.81) (0.85)
Students in math achievement L-I (%) 59.35 62.17 -0.39
(22.04) (21.83) (1.51)
Students in language achievement L-IV (%) 3.08 2.73 0.17
(5.66) (3.96) (0.28)
Students in language achievement L-I (%) 51.54 52.70 -0.03
(20.40) (20.33) (1.38)
Marginalization 0.62 0.46 0.05
(0.49) (0.50) (0.03)
Urbanization 0.38 0.48 0.02
(0.49) (0.50) (0.03)
Number of students 281.89 272.72 15.97
(165.34) (159.63) (10.28)
Number of teachers 9.54 9.21 0.37
(4.36) (4.15) (0.27)
Student-teacher ratio 28.60 28.74 0.63
(7.08) (6.98) (0.40)
Observations 889 305 1,194
This table presents the mean and standard error of the mean (in parentheses) for schools taking the
SisAT implementation and use survey (Column 1) and schools not taking it (Column 2). Column 3
shows the mean difference between participant and non-participant schools, as well as the standard
error of the difference, clustered at the school level. Achievement level (L) refers to PLANEA exam
scores, which range from L-I (lowest) to L-IV (highest). Marginalization is a variable coded 1 for
areas that have a “high” or “very high” marginalization, and 0 otherwise according to CONAPO.
Urbanization is a variable coded 1 for schools located in an urban area and 0 otherwise. The number
of students and teachers is taken from Formato 911 for the year 2015. p<0.10, ∗∗ p<0.05, ∗∗∗
p<0.01
A.12
Table A.12: Differences in the likelihood of answering the SisAT’s implementation survey
by school characteristics
(1) (2) (3)
Answered SisAT’s survey
Direct training 0.089∗∗∗ 0.16
(0.024) (0.11)
Students in math achievement L-IV (%) -0.00092 -0.0022
(0.0014) (0.0020)
Students in language achievement L-IV (%) 0.0024 0.0019
(0.0023) (0.0058)
Marginalization 0.061∗∗ 0.12∗∗∗
(0.031) (0.045)
Urbanization 0.023 0.084
(0.033) (0.048)
Student-teacher ratio 0.0029 0.0023
(0.0020) (0.0030)
Direct training ×Students in math achievement L-IV (%) 0.0027
(0.0026)
Direct training ×Students in language achievement L-IV (%) -0.00091
(0.0062)
Direct training ×Marginalization -0.11∗∗
(0.055)
Direct training ×Urbanization -0.10
(0.058)
Direct training ×Student-teacher ratio 0.00070
(0.0036)
No. of obs. 1,194 1,193 1,193
This table presents the effect of different school characteristics on the likelihood of taking the SisAT’s implemen-
tation survey by school characteristics. Achievement level (L) refers to PLANEA exam results, which are scored
from L-I (lowest) to L-IV (highest). Marginalization is a variable coded 1 for areas that have a “high” or “very
high” marginalization, and 0 otherwise according to CONAPO. Urbanization is a variable coded 1 for schools
located in an urban area and 0 otherwise. The number of students and teachers is taken from Formato 911 for the
year 2015. All regressions take into account the randomization design (i.e., include strata fixed effects). Standard
errors are clustered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.13
Table A.13: Heterogeneous effects on management
Management Principal Principal Marginalization
2015 gender tenure
Direct training 0.095 0.091 0.18∗∗∗ 0.22∗∗∗
(0.071) (0.069) (0.065) (0.075)
Direct training ×Covariate 0.070 0.080 -0.012 -0.16
(0.071) (0.11) (0.0087) (0.11)
Covariate 0.20∗∗∗ 0.024 0.0095 0.016
(0.053) (0.081) (0.0065) (0.086)
No. of obs. 511 913 913 913
Control mean 0.12 -0.05 -0.05 -0.05
This table shows the results from estimating Equation 2when the outcome variable is the 2018 DWMS
index. “Management 2015” refers to the index calculated with baseline information, “Principal gender”
takes a value of 1 for female principals and 0 for males, “Principal tenure” refers to the number of years
as principal, and “Marginalization” takes a value of 1 for schools located in areas with high or very
high marginalization. All regressions take into account the randomization design—i.e., include strata
fixed effects. Standard errors are clustered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
Table A.14: Heterogeneous effects on learning
Student Management Principal Principal Marginalization
gender 2015 gender tenure
Direct training 0.033 0.046 0.0076 0.010 0.047
(0.030) (0.042) (0.037) (0.037) (0.044)
Direct training ×Covariate 0.0063 -0.052 0.056 0.0043 -0.027
(0.025) (0.042) (0.059) (0.0051) (0.057)
Covariate 0.22∗∗∗ 0.037 0.070-0.00023 -0.21∗∗∗
(0.018) (0.026) (0.041) (0.0034) (0.043)
No. of obs. 37,958 19,112 37,958 37,867 37,958
Control mean -0.02 0.00 -0.02 -0.02 -0.02
This table shows the results from estimating Equation 2when the outcome variable is the PCA index from math
and language 2018 PLANEA scores. “Management 2015” refers to the index calculated with baseline information,
“Principal gender” takes a value of 1 for female principals and 0 for males, “Principal tenure” refers to the number
of years as principal, and “Marginalization” takes a value of 1 for schools located in areas with high or very
high marginalization. All regressions take into account the randomization design—i.e., include strata fixed effects.
Standard errors are clustered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.14
Table A.15: Treatment effect on principal turnover
Turnover
Direct training -0.030
(0.031)
No. of obs. 1,010
Indirect training mean .43
This table presents the treatment effects on
principal turn over (i=1 if there is a change
in the school’s principal between 2015 and
2018). All regressions take into account the
randomization design—i.e., include strata
fixed effects. Standard errors are clustered
at the school level. p<0.10, ∗∗ p<0.05,
∗∗∗ p<0.01
A.15
Table A.16: Heterogeneous effects of principal turnover
Panel A Management
Direct training 0.16∗∗
(0.073)
Direct training ×Principal change -0.087
(0.10)
Principal change 0.025
(0.073)
No. of obs. 909
Control mean -0.05
Panel B PLANEA 2018
Direct training 0.042
(0.039)
Direct training ×Principal change -0.027
(0.059)
Principal change -0.063
(0.042)
No. of obs. 37,465
Control mean -0.01
This table shows the results from estimating Equation 2when
the covariate is principal turnover (a change in the principal be-
tween 2015 and 2018). Panel A has as the outcome variable the
2018 DWMS index. Panel B has the PLANEA 2018 score as the
outcome variable. All regressions take into account the random-
ization design—i.e., include strata fixed effects. Standard errors
are clustered at the school level. p<0.10, p<0.05, ∗∗∗
p<0.01
A.2 Relationship between management practices and the Stallings class-
room observation and SisAT tools
Was it reasonable to expect that providing training on two tools would improve man-
agerial practices and test scores? We address this question by looking at the correlation
between the self-reported information on the use of the Stallings classroom observation
and SisAT tools on both the DWMS and test scores. As mentioned above, these results
should be interpreted with caution since they rely on school principals’ self-reported
assessments. Beyond measurement error problems, schools that answered the online
surveys are statistically different from those that did not in several observable charac-
teristics, including treatment status (see Tables A.9A.12). Hence, this section does not
attempt to establish a causal relationship between the use of management tools, Stallings
A.16
and SisAT, and the DWMS or test scores. Instead, the three correlations described in this
section are presented for completeness.
First, using both tools is positively correlated with the DWMS (see Table A.17). In
other words, the more likely principals are to use the management tools, the higher
the DWMS index. Second, “direct training” schools are more likely than those that re-
ceived cascade-style training to implement both tools (see Table A.19). Thus, the “direct
training” intervention was more successful than the cascade intervention at encouraging
principals to use the management tools. Combining these two results—“direct training”
schools are more likely to use the management tools provided to them, and these tools
are correlated with the DWMS—it is unsurprising that the treatment improves DWMS
scores (as shown in Panel A, Table 4). Finally, the self-reported information also shows
that the correlation between using the management tools and test scores is not statisti-
cally significant (see Table A.18), which is aligned with the finding that the treatment
did not improve learning outcomes (as shown in Panel B, Table 4).
Table A.17: Association between DWMS and implementation and use indexes
(1) (2) (3) (4)
Management index
Implementation index Stallings 0.049
(0.029)
Use index Stallings 0.064∗∗
(0.028)
Implementation index SisAT 0.064∗∗
(0.027)
Use index SisAT 0.086∗∗∗
(0.033)
No. of obs. 650 645 691 686
This table presents the conditional correlation between DWMS and implementation
and use indexes. The implementation and use indexes are constructed as the simple
average of the online survey variables for each element of the intervention. The man-
agement index and implementation and use indexes are standardized. All regression
controls for strata fixed effects and enumerator fixed effects. Standard errors are clus-
tered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.17
Table A.18: Association between learning outcomes and implementation and use indexes
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Math Language Average
Implementation index Stallings 0.024 0.017 0.024
(0.016) (0.016) (0.016)
Use index Stallings 0.0087 0.027 0.020
(0.018) (0.017) (0.019)
Implementation index SisAT 0.0044 0.014 0.0098
(0.015) (0.015) (0.016)
Use index SisAT -0.0084 -0.022 -0.016
(0.017) (0.017) (0.017)
No. of obs. 27,643 27,516 28,994 28,711 27,966 27,837 29,247 28,965 26,682 26,561 28,076 27,807
This table presents the conditional correlation between learning outcomes and implementation and use indexes. The implementation and use indexes are
constructed as the simple average of the online survey variables for each element of the intervention. The learning outcomes and implementation and use
indexes are standardized. All regressions control for strata fixed effects. Standard errors are clustered at the school level. p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.18
Table A.19: Treatment on Stallings and SisAT implementation and use
Implementation index Stallings Use index Stallings Implementation index SisAT Use index SisAT
Direct training 0.39∗∗∗ 0.38∗∗∗ 0.15∗∗ -0.082
(0.074) (0.071) (0.067) (0.068)
No. of obs. 827 822 866 860
Control mean -0.27 -0.23 -0.08 0.05
Both the implementation and use indexes are constructed as the simple average of the online survey variables for each element of the intervention.
All regressions take into account the randomization design—i.e., include strata fixed effects. Standard errors are clustered at the school level.
p<0.10, ∗∗ p<0.05, ∗∗∗ p<0.01
A.19
A.3 Additional details on the experimental design
The sampling strategy and final criteria for selecting schools to participate in the experi-
ment varied slightly across the seven participating states. A total of 1,496 schools were in
the original experimental design assigned to one of three groups: (1) “train the trainer
schools, which received a school grant and school principals’ managerial training using
the cascade model (n=599), (2) “direct training” schools, which received a school grant
and school principals’ managerial training delivered by professional trainers (n=698);
and (3) a “no grants” group, which received school principals’ managerial training using
the cascade model (n=199). Table A.20 summarizes the interventions received by the
three groups included in our research design. In this paper, we focus on the difference
between the “direct training” and “train the trainer” schools.
Table A.20: Summary of the three experimental groups
Direct Train the No
training trainer grants
School budgetary autonomy
School cash grant Yes Yes
School principal managerial training
Classroom observation + training by professionals Yes
Classroom observation + training via the cascade Yes Yes
Foundational skills measurement + training by professionals Yes
Foundational skills measurement + training via the cascade Yes Yes
Estado de M´
exico and Puebla had all three types of schools (“train the trainer,” “di-
rect training,” and “no grants”), while the other five states had only “direct training”
and “train the trainer” schools. In Estado de M´
exico, due to its large number of eligi-
ble schools, the local authorities decided to have two experiments. In one experiment,
schools were randomly assigned to either “train the trainer” or “direct training”.23 In the
second experiment (not part of the main sample in this paper), we sampled 200 schools
that did not participate in the first experiment from the metropolitan area of Mexico City
and randomly assigned them to either “direct training” or “no grants.”24
23Schools in this experiment are not a representative sample of schools in the state. They are larger than
average, more likely to be in rural areas (see Figure A.3c), and have below-average achievement levels.
24Since all eligible schools in the first call for applications had already been notified that they were
selected to participate in the program, in order to have a control group, state-level education authorities
issued a second call for applications. These schools are mainly in urban areas of Estado de M´
exico (see
Figure A.3d) in what is considered the greater metropolitan area of Mexico City.
A.20
In Puebla, schools were stratified based on their locality’s marginalization (high/low)
and whether they were urban or rural. A random sample of 300 primary schools—
proportional to the group’s size—was selected to participate in the experiment. We
ranked schools within each group based on their enrollment. We assigned schools in a
repeating sequence to “train the trainer,” “direct training,” or “no grants”; the order in
which the sequence began was randomized.
In Morelos, Tlaxcala, Guanajuato, Tabasco, and Durango, eligible public primary
schools were randomly assigned to either “train the trainer” or “direct training.” The
sample sizes were 130, 110, 200, 165, and 200 schools, respectively.
A.21
Figure A.3: Geographical distribution of the treatment assignment by state
Not in the RCT
Train the trainer
Direct training
No grants
(a) Durango
Not in the RCT
Train the trainer
Direct training
No grants
(b) Guanajuato
Not in the RCT
Train the trainer
Direct training
No grants
(c) Estado de M´
exico – 1
Not in the RCT
Train the trainer
Direct training
No grants
(d) Estado de M´
exico – 2
Not in the RCT
Train the trainer
Direct training
No grants
(e) Morelos
Not in the RCT
Train the trainer
Direct training
No grants
(f) Puebla
Not in the RCT
Train the trainer
Direct training
No grants
(g) Tabasco
Not in the RCT
Train the trainer
Direct training
No grants
(h) Tlaxcala
A.22
Table A.21: Treatment assignment by state
State Train the Direct No Blocking
trainer training grants
Durango 100 100 0 None.
Estado de M´
exico –1 100 100 0 None.
Estado de M´
exico –2 0 100 100 None.
Morelos 100 100 0 Blocking by school level. 130 primary schools
and 70 secondary schools in separate blocks.
Tlaxcala 100 100 0 Blocking by school level. 110 primary schools
and 90 secondary schools in separate blocks.
Guanajuato 100 100 0 200 schools randomly sampled. Schools were
ranked based on enrollment. Odd-ranked
schools were assigned to “train the trainer”
and even-ranked schools were assigned to “di-
rect training.”
Tabasco 83 82 0 165 eligible schools (with 60 or more students
and 5 or more teachers). Eligible schools were
ranked by priority (level 2, 4, 5, or 6) and num-
ber of students. Odd-ranked schools were as-
signed to “train the trainer” and even ranked
schools were assigned to “direct training.”
Puebla 101 100 99 300 eligible schools (60 or more students
and 5 or more teachers). Blocks were cre-
ated based on marginalization and urbaniza-
tion. For the marginalization categories, the
“very high” and “high” CONAPO classifi-
cations were grouped into a single category
(“high” marginalization) and the “medium,”
“low” and “very low” CONAPO classifica-
tions were grouped into another single cate-
gory (“not high” marginalization).
This table shows the number of schools and, for each design, their distribution among treatment conditions. Note
there are two distinct designs for Estado de M´
exico, both of which have 200 schools. Blocking was used in five of
the eight designs. While in some states the randomization included secondary schools, in this paper we focused
on primary schools as this is the sample for which we have learning outcome data. Since the randomization, in
such cases, was stratified by school type (primary vs. secondary), this sample restriction does not invalidate the
internal validity of our results.
A.4 Detailed cost calculations
Our analysis includes both variable and fixed costs. The data we use to estimate the
costs come from four sources: (i) official program documents, (ii) interviews with imple-
menting staff from the federal government, (iii) administrative records, and (iv) federal
A.23
government payroll data.25 We do not include the direct and indirect costs of imple-
menting the overall Escuela al Centro strategy beyond those associated with the “direct
training” and the “train the trainer” interventions. We report the average variable cost of
implementing the interventions across the seven participating states. All cost estimates
reported are expressed in 2015 USD and correspond to average per-school costs.
Table A.22 summarizes the interventions’ total costs, dividing them into variable
and fixed costs (staff time). The total fixed cost of implementing the interventions in
“train the trainer” and “direct training” schools (1,198 schools) over the two years of
implementation was approximately 1.75 million USD (1,462 USD per school). This cost
includes the salaries of the federal- and state-level field coordination teams, which led
the design, implementation, and monitoring of all activities included as part of this
impact evaluation.
The largest variable cost was the training provided to school principals to improve
their management capacities. This cost includes lodging, transportation, materials, fa-
cilities, and catering for beneficiaries. The capacity-building activities were designed
around two training programs—one for the Stallings classroom observation and one for
the SisAT tool. The implementation of the two training programs with professional train-
ers, which benefited the “direct training” school principals, had an average cost of 472
USD per school—266 USD per school for the Stallings classroom observation tool and
206 USD per school for the SisAT tool. Both of the training programs took place yearly
in both years. Each program consisted of a five-day training session led by a team of
professional trainers. The professional trainers included staff from the federal education
authority (SEP) who had developed the tools.
All schools participating in the impact evaluation (i.e., “direct training,” “train the
trainer,” and “no grants”) received the training in these tools via the cascade model. The
training cost in the “train the trainer” model was around 2.4 USD per school.
25All federal employees’ salaries in Mexico are published annually on the Nomina Transparente website:
https://nominatransparente.rhnet.gob.mx/.
A.24
Table A.22: Total cost of implementing the interventions over two years by treatment
group (in 2015 USD)
Description Per-school cost Per-student cost
Direct training Train the trainer Direct training Train the trainer
Panel A: Fixed costa
Staff time 1,462 1,462 5.24 5.24
Panel B: Variable costs
Managerial intervention 472 2.4 1.69 0.001
Total 1,934 1,464 6.93 5.24
aThe core team from the SEP included: one Director General, one Coordinator of Academic Activities, one Head of
Technical Issues and Monitoring, and two Department Chiefs. The core team in each state-level education authority
generally included three staff members at the “Area Subdirector” level. The core team that developed the classroom
observation tool included five senior staff from the SEP—one “Area Director,” one “Area Subdirector,” two “Department
Directors,” and one analyst. The tool was adapted from an existing classroom observation tool developed in 2011 as
part of a large-scale study on classroom time use (Bruns & Luque,2014). The core team that developed the data to
guide the instruction tool included eight senior staff from the SEP—one “Area Director,” five “Area Subdirectors,” and
two IT experts. The tool was developed to implement the implementation of the Escuela al Centro strategy and involved
close coordination with staff from the Curriculum Area of the SEP.
A.5 Short-term leadership certificate training program
Some of the principals in “direct training” schools participated in a short-term leadership
certificate training program. The state-level education authorities selected this program
based on Federal SEP guidelines related to national school principals’ profile standards.
These standards define principals as school and community leaders who: (i) know the
school’s and classroom’s dynamics, as well as the school’s organization and operation;
(ii) are recognized as professionals who continuously participate in professional devel-
opment activities to improve the quality of the educational service; (iii) assume and
promote the legal and ethical principles inherent in their roles and educational work, to
ensure students’ right to a quality education; and (iv) know the school’s social and cul-
tural context and establish collaborative relationships with the community, school area,
and other instances, to enrich the educational task.26
The DMWS instrument used to measure school principals’ managerial capacity is
constructed based on four dimensions of managerial practices, including operations
management, people management, target setting, and monitoring. When adapting the
DMWS instrument to Mexico, the Federal SEP included an additional dimension related
26The school principals profile standards are available at: http://servicioprofesionaldocente.sep
.gob.mx/portal-docente-2014-2018/2018/PPI PROMOCION EB 2018 19012018.pdf
A.25
to leadership as this aligned well with the national school principal’s profile standards.
However, the DWMS measure used here is constructed using only the original four di-
mensions.
A.26
... One study reports a 0.13 SD effect on the D-WMS index of management practices (Romero et al., 2022). Another finds a 0.3 SD improvement in a separate index of management practices inspired by the D-WMS (Beg et al., 2021). ...
... 2021), take-up of the capacity building workshops were variable and difficult to sustain. Romero et al. (2022) find that the already low take up of their intervention fell to nearly zero when the program was implemented through a ''training of the trainers'' method, a common approach to delivering interventions. 17 ...
... Cost per trainee (% GDP pc) reports median of program cost as share of country GDP per capita, using GDP per capita in the country of program implementation. Studies in column 1 include (Beg et al., 2021;Blimpo et al., 2015;de Barros et al., 2019;de Hoyos et al., 2020;Fryer, 2017;Ganimian & Freel, 2020;Jacob et al., 2015;Kraft & Blazar, 2014;Lassibille et al., 2010;Lohmann et al., 2020;Romero et al., 2022;Tavares, 2015). ...
Article
Full-text available
Improving school quality in low and middle income countries (LMICs) is a global priority. One way to improve quality may be to improve the management skills of school leaders. In this systematic review, we analyze the impact of interventions targeting school leaders’ management practices on student learning. We begin by describing the characteristics and responsibilities of school leaders using data from large, multi-country surveys. Second, we review the literature and conduct a meta-analysis of the causal effect of school management interventions on student learning, using 39 estimates from 20 evaluations. We estimate a statistically significant improvement in student learning of 0.033 standard deviations. We show that effect sizes are not related to program scale or intensity. We complement the meta-analysis by identifying common limitations to program effectiveness through a qualitative assessment of the studies included in our review. We find three main factors which mitigate program effectiveness: (1) low take-up; (2) lack of incentives or structure for implementation of recommendations; and (3) the lengthy causal chain linking management practices to student learning. Finally, to assess external validity of our review, we survey practitioners to compare characteristics between evaluated and commonly implemented programs. Our findings suggest that future work should focus on generating evidence on the marginal effect of common design elements in these interventions, including factors that promote school leader engagement and accountability.
... Finally, these schools were part of a larger school sample in which information on schools' managerial practices was collected in 2015 using the Development World Management Survey (DWMS) (Bloom and Van Reenen 2007;Bloom et al. 2013;Lemos and Scur 2016;Romero et al. 2022). The DWMS is an adaptation of the World Management Survey for developing countries in which management managerial practices are measured along four dimensions: operations management, people management, target setting and monitoring. ...
... 13 A third treatment arm involved receiving management training from professional trainers (as well as the grant). That treatment is studied in a companion paper (Romero et al. 2022), which documents that while management practices improve by 0.13 , there were no meaningful improvements in test scores. Thus the total sample included 300 primary schools selected to participate in the experiment. ...
Article
Full-text available
We use a randomized experiment (across 200 public primary schools in Puebla, Mexico) to study the impact of providing schools with cash grants on student test scores. Treated schools received on average 16 USD per student each year for two years, an increase of 20% in public spending per child, after teacher salaries. Overall, the grants had no impact on student test scores. Lack of a treatment effect does not seem to be driven by poor implementation or a substitution away from other inputs (e.g. household expenditure).
... Fryer Jr (2017), de Hoyos et al. (2020 and Tavares (2015), reporting on interventions in the United States, Argentina and Brazil respectively, find significant impacts from such interventions on student outcomes, although impacts were frequently heterogenous and took 1-2 years to emerge. Studies from Mexico (Romero et al., 2022) and India (Muralidharan and Singh,1 The authors identify an impact of 0.13 standard deviations (S.D.) in mathematics and 0.09 in reading from a 1 S.D. improvement in principal effectiveness, as compared to a 0.17 S.D. gain in mathematics and 0.13 S.D. in reading from a 1 S.D. improvement in teacher effectiveness. 2 There are reasonable grounds to expect a relatively high marginal return from improved leadership at the school level in these countries. In particular, constraints of inputs unsurprisingly tend to be higher in low-income countries, raising the importance of efficient allocation and utilization of these resources; however, misallocation and poor utilization tend to be more common in low-income countries, particularly those in Sub-Saharan Africa (Bashir et al., 2018), suggesting a greater potential gain from efforts to build the skills of principals in resource management and utilization. ...
Conference Paper
Evidence from high-income countries suggests that the quality of school leadership has measurable impacts on teacher behaviors and student learning achievement. However, there is a lack of rigorous evidence in low-income contexts, particularly in Sub-Saharan Africa. This study tests the impact on student progression and test scores of a two-year, multi-phase intervention to strengthen leadership skills for head teachers, deputy head teachers, and sub-district education officials. The intervention consists of two phases of classroom training along with follow-up visits, implemented over two years. It focuses on skills related to making more efficient use of resources; motivating and incentivizing teachers to improve performance; and curating a culture in which students and teachers are all motivated to strengthen learning. A randomized controlled trial was conducted in 1,198 schools in all districts of Malawi, providing evidence of the impact of the intervention at scale. The findings show that the intervention improved student test scores by 0.1 standard deviations, equivalent to around eight weeks of additional learning, as well as improving progression rates. The outcomes were achieved primarily as a result of improvements in the provision of remedial classes.
... Understanding the important role of a school principal, it is appropriate that school principals must be equipped with qualified skills in carrying out their duties. Various trainings were followed to be one of the solutions to improve the skills of school principals [19,20]. However, there have not been many trainings specifically related to improving the skills of school principals in the context of facilitating learning independence in schools, where the problem is also felt by Madrasah Principals in Bondowoso Regency, East Java, Indonesia. ...
Article
Full-text available
In this report we first explore private school trends in Latin America, using available census data and administrative records in each country, and then we review the policies adopted by Latin American countries to strengthen their mixed schooling systems. The review of these policies is to focus mainly on the specific design and implementation features and the evidence of their impact on efficiency and equity. We characterize policies into three dimensions: i) the design and regulations of public funding of private schools; ii) school admission systems; and iii) information and accountability. After reviewing the evidence and country cases studied in this report, we conclude with a set of recommendations that could provide Latin-American countries and another middle- and low-income countries with a high proportion of students enrolled in private schools, with a policy road map to introduce finance and regulations that promote quality and equity.
Article
Full-text available
Nationally, 18% of principals turn over each year, yet research has not yet credibly established the effects of this turnover on student and teacher outcomes. Using statewide data from Missouri and Tennessee, we employ a difference-in-differences model with a matched comparison group to estimate arguably causal effects. We find that principal turnover lowers school achievement by .03 SD in the next year, on average. Effects vary by transition type, with larger negative effects for transfers to other schools but no or even positive later effects of demotions of (presumably lower-performing) principals. Principal turnover also increases teacher turnover, but this does not explain the drop in student achievement. Replacement with an experienced successor can largely offset negative principal turnover effects.
Article
Full-text available
Teachers, like all professionals, require ongoing professional development opportunities to improve their skills. This paper provides evidence on effective professional development characteristics and how at-scale programs incorporate those characteristics. The authors propose a standard set of 70 indicators—the In-Service Teacher Training Survey Instrument—for reporting on professional development programs as a prerequisite for understanding the characteristics of those programs that improve student learning. The authors apply the instrument to rigorously evaluated professional development programs in low- and middle-income countries. Across 33 programs, those programs that link participation to career incentives, have a specific subject focus, incorporate lesson enactment in the training, and include initial face-to-face training tend to show higher student learning gains. In qualitative interviews, program implementers also report follow-up visits as among the most effective characteristics of their professional development programs. The authors then apply the instruments to a sample of 139 government-funded, at-scale professional development programs across 14 countries. This analysis uncovers a sharp gap between the characteristics of teacher professional development programs that evidence suggests are effective and the global realities of most teacher professional development programs.
Article
This article examines the impact of two strategies for using large-scale assessment results to improve school management and classroom instruction in the province of La Rioja, Argentina. In the study, 104 public primary schools were randomly assigned to three groups: a diagnostic-feedback group, in which standardized tests were administered at baseline and two follow-ups and results were made available to schools; a capacity-building group, in which workshops and school visits were conducted; and a control group, in which tests were administered at the second follow-up. After two years, diagnostic-feedback schools outperformed control schools by 0.33 standard deviations (σ) in mathematics and 0.36σ in reading. In fact, feedback schools still performed 0.26σ better in math and 0.22σ better in reading in the national assessment a year after the end of the intervention. Additionally, principals at feedback schools were more likely to use assessment results in making management decisions, and students were more likely to report that their teachers used more instructional strategies and to rate their teachers more favorably. Combining feedback with capacity building does not seem to yield additional improvements, but this could be due to schools assigned to receive both components starting from lower learning levels and participating in fewer workshops and visits than expected.
Book
Despite the recent growth in the number of large-scale student assessments, there is little evidence on their potential to inform improvements in school management and classroom instruction in developing countries. This study conducted an experiment in the Province of La Rioja Argentina, that randomly assigned 105 public primary schools to: (a) a “diagnostic feedback” group in which standardized tests were administered in math and reading comprehension at baseline and two follow-ups and the results were made available to the schools through userfriendly reports; (b) a “capacity-building” group for which schools were provided with the reports and also workshops and school visits for supervisors, principals, and teachers; or (c) a control group, in which the tests were administered only at the second follow-up. After two years, diagnostic feedback schools outperformed control schools by .34 and .36 standard deviations (SD) in third grade math and reading, and by .28 and .38 SD in fifth grade math and reading. The principals at these schools were more likely to report using assessment results for management decisions, and students were more likely to report that their teachers engaged in more instructional activities and improved their interactions with them. Capacity-building schools saw more limited impacts due to lower achievement at baseline, low take up, and little value-added of workshops and visits. However, in most cases the results cannot discard the possibility that both interventions had the same impact.
Article
We study the relationship among productivity, management practices, and employee ability using German data combining management practices surveys with employees’ longitudinal earnings records. Including human capital reduces the association between productivity and management practices by 30%–50%. Only a small fraction is accounted for by the higher human capital of the average employee at better-managed firms. A larger share is attributable to the human capital of the highest-paid workers, that is, the managers. A similar share is mediated through the pay premiums offered by better-managed firms. We find that better-managed firms recruit and retain workers with higher average human capital.
Article
We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.
Article
The Mexican state of Colima implemented a low-stakes accountability intervention with diagnostic feedback among schools with the lowest test scores in the national assessment. A difference-in-difference and a regression discontinuity design are used to identify the effects of the intervention on learning outcomes. The two strategies consistently show that the intervention increased test scores by 0.12 standard deviations only a few months after the program was launched. The results indicate that full and wide dissemination of information detailing school quality is critically important.