Running Randomized Evaluations: A Practical Guide
... Randomised controlled trials conducted in natural settings are often called field experiments. Randomised controlled trials are utilised in cases where it is unclear what the actual effect would be and whether a treatment, such as development programmes, is effective (Gerber and Green, 2012;Glennerster and Takavarasha, 2013). Trials can also be informative for policy implementation because costs and risks are significantly lower in an experiment organised in a small scale than in a full-scale implementation process (Haynes et al., 2012). ...
... As a result, randomised controlled trials allow causal inferences to be made. When the treatment and control groups are identical at the beginning of the experiment, the observed difference between the groups is attributed to the treatment (Gerber and Green, 2012;Glennerster and Takavarasha, 2013;Haynes et al., 2012). ...
... In randomised controlled trials, non-compliance and partial compliance are possible threats (Gerber and Green, 2012;Glennerster and Takavarasha, 2013). In other words, individuals randomly assigned to the treatment group may not participate or participate only partially in the programme. ...
... As a result, they recommend instead stratifying units into larger groups. Glennerster and Takavarasha (2013) warn against the practice of dropping pairs and point out that the widespread practice of including pair fixed effects in a regression of outcomes on treatment is equivalent to computing the differencein-means estimator after dropping pairs. Accordingly, they go on to suggest that experimenters instead stratify the units into larger groups if there is risk of attrition. ...
... r drop-out. If a unit drops out of the study [...] its pair unit can also be dropped from the study, while the set of remaining pairs will still be as balanced as the original dataset. In contrast, in a pure randomized experiment, if even one unit drops out, it is no longer guaranteed that the treatment and control groups are balanced, on average."Glennerster and Takavarasha (2013) chapter 4, page 159: "In paired matching, for example, if we lose one of the units in the pair [...] and we include a dummy for the stratum, essentially we have to drop the other unit in the pair from the analysis. [...] Some evaluators have mistakenly seen this as an advantage of pairing [...] But in fact if we drop the pair we have ju ...
In this paper we revisit some common recommendations regarding the analysis of matched-pair and stratified experimental designs in the presence of attrition. Our main objective is to clarify a number of well-known claims about the practice of dropping pairs with an attrited unit when analyzing matched-pair designs. Contradictory advice appears in the literature about whether or not dropping pairs is beneficial or harmful, and stratifying into larger groups has been recommended as a resolution to the issue. To address these claims, we derive the estimands obtained from the difference-in-means estimator in a matched-pair design both when the observations from pairs with an attrited unit are retained and when they are dropped. We find limited evidence to support the claims that dropping pairs is beneficial, other than in potentially helping recover a convex weighted average of conditional average treatment effects. We then repeat the same exercise for stratified designs by studying the estimands obtained from a regression of outcomes on treatment with and without strata fixed effects. We do not find compelling evidence to support the claims that stratified designs should be preferred to matched-pair designs in the presence of attrition.
... For example, one study mentioned that during the distribution of eyeglasses at the RCT, students in the control group were mistakenly provided with eyeglasses, which may in part have undermined the empirical evidence [7]. Secondly, partial compliance may also threaten the evaluation of RCTs [12]. Specifically, in many similar programs, some students or parents in the treatment group refused free eyeglasses due to the belief that wearing them worsens vision problems, which poses a partial compliance problem [13]. ...
... However, since not all myopic students wore the eyeglasses provided by the program (partial compliance), we also estimated results from the Local Average Treatment Effect (LATE). The LATE scales up the treatment to take account of partial compliance and reveals the actual impact of wearing eyeglasses [12,24]. Lastly, we estimated the interaction terms of the intervention and baseline characteristics to analyze the heterogeneous treatment impacts of eyeglasses. ...
Although eyeglasses have been considered a cost-effective way to combat myopia, the empirical evidence of its impacts on improving learning outcomes is inconsistent. This paper provides empirical evidence examining the effect of providing eyeglasses on academic performance between provinces with a different economic level in western China. Overall, we find a significant impact in Intention-to-Treat analysis and a large and significant local average treatment effect of providing free eyeglasses to students in the poor province but not in the other. The difference in impact between the two provinces is not a matter of experimental design, implementation, or partial compliance. Instead, we find that the lack of impact in the wealthier provinces is mainly due to less blackboard usage in class and wealthier households. Our study found that providing free eyeglasses to disadvantaged groups boosted their academic performance more than to their counterparts.
... ITT shows the causal impact of offering the treatment (being "invited") on participants' financial behaviours and is used often by policymakers as it offers the average effects of being invited. TOT shows the causal impact of the actual exposure to treatment on participants' financial behaviour and is useful to the evaluator (Ravallion 2007;Glennerster and Takavarasha 2013). Random invitation status was used as the Treat i variable in estimating ITT, whereas actual participation status to the workshop was used as the Treat i variable for TOT. ...
... Attrition within the sample was low (18.1%) (Table 2), however, we examined whether this attrition posed validity issues to the analyses (Glennerster and Takavarasha 2013). We investigate whether attrition systematically affected the outcome variables of interest, using the baseline sample in Eq. (2): where Y is a primary outcome variable at the baseline, T is an indicator for the treatment group, and Attrit is an indicator for attrition. ...
This study conducts a randomised control trial to offer a technical workshop and examine whether providing information about the full range of services on the mobile money platform would increase mobile money usage, by taking a case of the Ashanti Region, Ghana. We find a significant positive impact of mobile money education on the recent usage of mobile money for transactions. However, no significant evidence of the workshop was found on new mobile money account ownership, or on the share of transactions transmitted through mobile money. Furthermore, weak and volatile outcomes were observed as impacts on remittances after the interventions. We discuss potential reasons behind the weak effects found.
Supplementary information:
The online version contains supplementary material available at 10.1057/s41287-022-00529-x.
... Consider now an example from another context, the case of contaminated water sources discussed by Glennerster and Takavarasha (2013). Suppose you are in charge of evaluating alternatives for improving the quality of water with the goal of avoiding diarrhea in children, as usually international aid agencies do. ...
Context
There is an inflation of behavioral frameworks applied to social problems, such as tax dodging. There has been also a surge in the creation of the so-called nudge units throughout the world, following the success of the pioneer units in USA and UK. Meanwhile, there has been criticism directed at aspects such as ‘psychologism,’ paternalism, and short-termism associated with nudge approaches. Moreover, by ignoring systems thinking, complexity science and other broader approaches, nudging may lead to interventions that can be ineffective or counterproductive in the long term.
Goal
To overcome such limitations, the paper proposes an integrative framework, the Nested Circles Model, which put the intended behaviors in a perspective ranging from microworlds to broader upstream influences.
Method
The paper employs a qualitative approach to critically review the literature on nudging and map its shortcomings.
Results and contributions
The proposed model integrates major concepts from popular behavioral frameworks and incorporates elements that influence the repertoire of behaviors adopted by individuals, including intangible stocks (trust and fairness) and complexity markers. The paper concludes by exemplifying the application of the Nested Circles Model to three problems in the context of taxation in Brazil.
Keywords:
behavioral economics; systems thinking; behavioral science; complexity science; taxation
... & be systematically different from those subject to the control, but this problem can be avoided if the lottery is conducted around an eligibility cutoff. 54 For example, suppose a government has sufficient resources to offer housing vouchers to all households below the poverty line. Suppose further that it wants to evaluate some aspect of the program, for instance, its effects on self-reported health and labor market participation. ...
Government agencies and nonprofit organizations have increasingly turned to randomized controlled trials (RCTs) to evaluate public policy interventions. Random assignment is widely understood to be fair when there is equipoise; however, some scholars and practitioners argue that random assignment is also permissible when an intervention is reasonably expected to be superior to other trial arms. For example, some argue that random assignment to such an intervention is fair when the intervention is scarce, for it is sometimes fair to use a lottery to allocate scarce goods. We investigate the permissibility of randomization in public policy RCTs when there is no equipoise, identifying two sets of conditions under which it is fair to allocate access to a superior intervention via random assignment. We also reject oft-made claims that alternative study designs, including stepped-wedge designs and uneven randomization, offer fair ways to allocate beneficial interventions.
... Since the randomization was stratified by district, we always control for district fixed e↵ects, but our most parsimonious regressions do not include other control variables (Glennerster and Takavarasha, 2013). In additional specifications, to improve precision and account for any chance imbalances, we (1) control for the baseline outcome variables at the village-level y v,1 and y v,2 from rounds 1 and 2 and (2) employ the double-lasso procedure of Urminsky et al. (2016) to select control variables from a large pool of pre-treatment measures. ...
We estimate the equilibrium effects of a public school grant program administered through school councils in Pakistani villages with multiple public and private schools and clearly defined catchment boundaries. The program was randomized at the village-level, allowing us to estimate its causal impact on the market. Four years after the start of the program, test scores were 0.2 sd higher in public schools. We find evidence of an education multiplier: test scores in private schools were also 0.2 sd higher in treated markets. Consistent with standard models of product differentiation, the education multiplier is greater for those private schools that faced a greater threat to their market power. Accounting for private sector responses increases the program's cost effectiveness by 85 percent and affects how a policymaker would target spending. Given that markets with several public and private schools are now pervasive in low-and middle-income countries, prudent policy requires us to account for private sector responses to public policy, both in their design and in their evaluation.
... The criticism is less often about RCTs per se than about putting them on a pedestal, with a special status that accords them more credibility than other evaluation methods. These RCTs focus on evaluating government or NGO programs and policies, and the hope of proponents is that having more credible measures of impact through ran dom iza tion will mean better investments and interventions (Glennerster and Takavarasha 2013;Kremer 2003, Banerjee andDuflo 2009). The questions usually focus on "what works. ...
In October 2019, Abhijit Banerjee, Esther Duflo, and Michael Kremer jointly won the 51st Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel "for their experimental approach to alleviating global poverty." But what is the exact scope of their experimental method, known as randomized control trials (RCTs)? Which sorts of questions are RCTs able to address and which do they fail to answer? This book provides answers to these questions, explaining how RCTs work, what they can achieve, why they sometimes fail, how they can be improved and why other methods are both useful and necessary. Chapters contributed by leading specialists in the field present a full and coherent picture of the main strengths and weaknesses of RCTs in the field of development. Looking beyond the epistemological, political, and ethical differences underlying many of the disagreements surrounding RCTs, it explores the implementation of RCTs on the ground, outside of their ideal theoretical conditions and reveals some unsuspected uses and effects, their disruptive potential, but also their political uses. The contributions uncover the implicit worldview that many RCTs draw on and disseminate, and probe the gap between the method's narrow scope and its success, while also proposing improvements and alternatives. This book warns against the potential dangers of their excessive use, arguing that the best use for RCTs is not necessarily that which immediately springs to mind, and offering opportunity to come to an informed and reasoned judgement on RCTs and what they can bring to development.
... Notably, efforts have been made to perform a randomized trial on the model (Boruch & Mosteller, 2002;Glennerster, & Takavarasha, 2013) by using countries not listed in the samples of this research. This was performed repeatedly not only in Airhunmwunde (2004) but has consistently been modified over the years with several interchanges between the countries selected. ...
... In the first instance, the donor and the partner NGOs did not agree to full randomisation, where all programme participants would be randomly chosen from a pool of applicants. Instead, we agreed to employ a partial randomisation, also known as a lottery around the cut-off (Glennerster and Takavarasha 2013). 6 To get accepted for the programme, participants were evaluated on three criteria, (1) school performance (including grades and other school activities); (2) participation (in youth, civic and other organisations), and (3) essay responses to questions from the organisers. ...
This study examines democracy promotion efforts that target young people in post-Soviet countries. Specifically we assess the effectiveness of a civic education programme in Poland in improving attitudes toward democracy and self-perceptions of political efficacy. The analysis of quasi-experimental data reveals that young citizens from post-Soviet states (Belarus, Moldova, Russia, and Ukraine) were more likely to show greater support of democratic institutions, hold democratic attitudes, and perceive themselves as having political efficacy. However, we interpret the results with caution as changes in the attitudes were not substantial. This may be attributed to the fact that democracy education programmes attract already politically and socially active young people.
... NGO-researcher collaborations undoubtedly yield valuable insights about the effectiveness of environmental, health, and development interventions (e.g., Brooks et al., 2016;Jayachandran et al., 2017;Miguel and Kremer, 2004). Indeed, proponents argue that "many of the best RCTs have come from long-term partnerships between researchers and NGOs or other local partners" (Glennerster, 2015). That said, we show that this seemingly symbiotic relationship may mask the unique roles NGOs often play in remote, rural settings, further exacerbating existing challenges associated with the generalizability, scalability and credibility of solutions deemed effective in applied research (Barrett and Carter, 2014;Ioannidis, Stanley, and Doucouliagos, 2017;Kowalski, 2022;Peters, Langbein, and Roberts, 2018;Vivalt, 2020). ...
Programs implemented by non-governmental organizations (NGOs) are often more effective than comparable efforts by other actors, yet relatively little is known about how implementer identity drives final outcomes. By combining a stratified field experiment in India with a tripledifferences estimation strategy, we show that a local development NGO's prior engagement with target communities increases the effectiveness of a technology-promotion program implemented in these areas by at least 30 percent. This “NGO reputation effect” has implications for the generalizability and scalability of evidence from experimental research conducted with local implementation partners.
... Moreover, TAU control groups may be more at-risk of Hawthorne and John Henry-effects: that the treatment and control groups behave differently because they know that they are participating in a study (Glennerster, 2013). While such effects are difficult to completely avoid in education interventions, placebo control groups also know that they are participating in study and, if the placebo treatment works well, believe that they participate on equal terms with the treatment group. ...
This is the protocol for a Campbell systematic review. Our primary objective for this systematic review is to examine if preschool and school‐based interventions aimed at improving language, literacy, and/or mathematical skills increase children's and adolescents' executive functions. As a secondary objective, we will examine how the effects of language, literacy, and mathematics interventions on executive functions are moderated by the subject of the intervention, child age or grade, the type of EF measured, and the at‐risk status of participants. We will also explore how the effects are moderated by other study characteristics, and estimate the effects of the included interventions on language, literacy, and mathematical skills.
... Communities and study households will be informed of the study and sensitized by the research team to its activities and purpose through community meetings and meetings with local leaders. Treatment and control groups will not be informed of their assigned group to avoid the possibility of introducing Hawthorne or John Henry effects [25]. An informed consent including maintenance of the confidentiality of personal data, and the possibility to refuse the consent without having to justify the refusal, will be obtained from all community leaders and study participants prior to responding to any surveys. ...
Timor-Leste is one of the world’s most malnourished nations where micronutrient-deficient diets are a contributing factor to the prevalence of child stunting, currently estimated to be 45.6% of children under five. Fish are an important source of nutrients and one that may assist the country’s predominantly rural population of agriculturalists to exit poverty and malnutrition. However, a small national fishing fleet producing low catch volumes places fish out of reach of most inland and upland populations where it is needed most. Fish consumption is very low in rural, inland areas compared to coastal, regional, and global averages. This study is a one-year, partially masked, cluster-randomized controlled trial among families living in rural, inland Timor-Leste. We aim to test and compare the effects of two treatments, alone and in combination, on the frequency and volume of household fish consumption in rural, inland areas as a proxy for improved dietary diversity and micronutrient intake. Treatment 1 is the installation of nearshore, moored fish aggregating devices (FADs) to improve catch rates with existing fishing gears. Treatment 2 is a social and behaviour change (SBC) activity to promote fish consumption. Villages in inland communities will be randomized to receive treatment 1, treatment 2, both treatments, or neither treatment. Data will be collected at baseline (prior to the rollout of the treatments) and endline. Our study will determine the impact of an improved supply of fish, along with nutrition-oriented SBC activities, on the fish purchasing and consumption practices of rural, inland households. Findings from this study are urgently needed by Small Island Developing States to guide policy and investment decisions on how best to improve households’ diets using locally available, nutrient-dense foods such as fish. Investments such as these are needed to break the cycle of malnutrition. This trial is registered at clinicaltrials.gov (NCT04729829).
Trial registration: Trial registered at clinicaltrials.gov Identifier: NCT04729829 .
... Due to the large number of clusters randomized in the FAARM trial (Wendt et al., 2019), a single post-intervention assessment of empowerment was sufficient to estimate program impact (Glennerster and Takavarasha, 2014). These clusters were allocated to intervention and control arms using covariate-constrained randomization after cluster and individual recruitment had been completed (Bolzern et al., 2019;Hayes and Moulton, 2017). ...
Nutrition-sensitive agricultural programs have the potential to improve women's and children's nutrition , along with women's empowerment. The project-level Women's Empowerment in Agriculture Index (pro-WEAI) aims to standardize the measurement of women's agency and enable the assessment of impact over typical project timelines. Within the Food and Agricultural Approaches to Reducing Malnutrition (FAARM) cluster-randomized controlled trial in rural Habiganj, Sylhet, Bangladesh, we examined quantitative pro-WEAI data collected from a subsample of trial participants and their husbands (n = 885) approximately four months after the end of the intervention. We evaluated the impact of a three-year homestead food production program on men's and women's agency separately by pro-WEAI domain and indicator, using multilevel logistic and linear regression. We show that women in the FAARM intervention group had levels of agency similar to men and much higher than women in the control group (Odds Ratio [OR] 7.7, p < 0.001), corresponding to better gender equity in intervention areas (OR 3.5, p < 0.001). The higher levels of agency among intervention women were driven by greater intrinsic and collective agency but not by instrumental agency. Compared to controls, more women in the intervention group found intimate partner violence unacceptable (OR 3.5, p < 0.001), had greater ownership of assets (OR 2.6, p = 0.001), better control of income (OR 1.8, p = 0.042), higher levels of group membership (OR 14.0, p < 0.001), and membership in groups they considered influential (OR 166.8, p < 0.001). Self-efficacy was greater in intervention areas for both women (OR 3.2, p < 0.001) and men (OR 2.3, p = 0.002). Our results contribute to the development of benchmarks for interpreting pro-WEAI scores across programs. Our assessment of the impact of a homestead food production program on women's agency provides additional rationale for women-led agricultural projects. We plan to build on these findings by examining the role of improved women's agency on the pathway from the intervention to nutritional impacts.
... However, were the intervention to increase donations by over 1.2 dollars it becomes economically attractive. More generally, economists should identify interventions for which the benefits outweigh the costs (Glennerster and Takavarasha, 2013). ...
... F rom textbooks and articles to seminars and online resources, advice on how to successfully design and conduct randomized controlled trials abounds (e.g., Gerber and Green 2012;Glennerster and Takavarasha 2013). Political scientists agonize over the research design, practitioner partnerships, and participant recruitment, to name only a few concerns. ...
... Exact matching searches and organizes people whose observable characteristics/covariates look identical. The validity of matching should result in no discernible differences between control and treatment groups that could correlate with the outcome and unobservable factors (excluding participation) [31,32]. ...
Sub-Saharan Africa will accommodate more population this century by having a multitude of births across the continent. Family planning methods provide women with techniques to manage their health and wellbeing. This study investigated how radio communications in family planning changed the perception of Ghanaian, Liberian, and Senegalese mothers toward having fewer children. Univariate and multivariate linear regression results after coarsened exact matching (CEM) with selected covariates for 15- to 49-year-old mothers from demographic and health survey (DHS) data implied the effectiveness of radio communications. This effort supports the need for further research on tailored communication methods for West African mothers over time.
... However, a new source of data provides the opportunity to improve on the status quo in at least one area of education: online learning. While modern practice in randomized controlled trials often involves recruiting a large and representative sample (Glennerster & Takavarasha, 2013), and the recruitment and research processes are expensive to conduct at scale (Feuer, Towne, & Shavelson, 2002), recruiting and studying large samples is considerably less painful in online learning platforms already used at scale. Commercial platforms for K-12 education are used by tens or hundreds of thousands of students (cf. ...
The purpose of this dissertation was to develop and use a platform that facilitates Massive Open Online Course (MOOC) replication research. Replication and the verification of previously published findings is an essential step in the scientific process. Unfortunately, a replication crisis has long plagued scientific research, affecting even the field of education. As a result, the validity of more and more published findings is coming into question. Research on MOOCs have not been exempt from this. Due to a number of limiting technical barriers, MOOC literature suffers from such issues as contradictory findings between published works and the unconscious skewing of results caused by overfitting to single datasets. The MOOC Replication Framework (MORF) was developed to allow researchers to bypass these technical barriers. Researchers are able to design their own MOOC analyses and have MORF conduct it for them across its massive store of MOOC data. The first study in this dissertation, which describes the work that went into building the platform that would eventually turn into MORF, conducted a feasibility study that aimed to investigate whether the platform was able to perform the tasks it was built for. This was done through the replication of previously published findings within a single dataset. The second study describes the initial architecture of MORF and sought to demonstrate the platform’s scaled feasibility to conduct large-scale replication research. This was done through the execution of a large-scale replication study against data from an entire University’s roster of MOOCs. Finally, the third study highlighted how MORF’s architecture allows for the execution of more than just replication studies. This was done through the execution of a novel research study that sought to analyze the generalizability of predictive models of completion between the countries present in MORF’s expansive dataset—an important issue to address given the massive enrollment numbers of MOOCs from all around the world.
... The experimental groups are well balanced (see Appendix A2), with no significant differences in participant characteristics between groups. The treatment group reports slightly higher levels of trust in national news organisations but this is not significant at 5% level, and it is not unexpected to find minor imbalances when undertaking multiple hypothesis testing (Glennerster & Takavarasha, 2013). Notably, both groups are highly favourable towards the hypothetical candidate with over 90% inclined to vote for him following vignette 1. ...
How do voters respond to candidates accused of sexual harassment? The literature on political scandals demonstrates that candidate characteristics, scandal type, and voter characteristics matter; as well as party affiliation. However, empirical evidence suggests that not all partisans react the same way. Why is this the case? Our study uses Schwartz's (1996) theory of values to hypothesise that voters prioritising 'universalism' and 'benevolence' are less likely to vote for candidates accused of sexual harassment compared to voters who prioritise 'self-enhancement' values Using an original, mixed methods, online survey experiment (n=704), we show that American voters do become less favourable towards candidates linked to allegations of sexual harassment; but a sizeable minority would nevertheless vote for a co-partisan candidate accused of sexual harassment. Values are an important mechanism to explain this heterogeneity. Qualitative data corroborates our findings, and helps explain why sexual harassment allegations are not always a barrier to electoral success.
A number of developing countries use land expropriation policies to expand cities and develop peri-urban areas. In China alone, an average of 1,600 square kilometers were expropriated annually between 2004 and 2018. The impact of this urban development strategy on expropriated households is not well-understood. I estimate the causal effect of expropriation on Chinese households' livelihood choice and earned income, relying on panel data and comparison to non-expropriated households to observe how household-level outcomes change in response to expropriation. Controlling for baseline outcomes, I find that for at least the first two years, expropriation reduces household agricultural participation and production but does not increase other types of income-generating activities. The result is reduced food security and ability to earn income. Compensation paid to households does not fully offset these effects in cases where households lose all their land or are uncompensated. These findings suggest concrete policies governments can implement to lessen the negative welfare impacts of urban development on expropriated households: higher compensation rates, development of rural non-agricultural labor markets, and direct food assistance to expropriated households.
Existe una preocupación creciente por la efectividad de las políticas públicas, tanto a nivel académico como práctico. El análisis de políticas públicas es cada vez más técnico y requiere de múltiples técnicas para establecer de forma robusta el efecto causal de una política. En este artículo, repasamos algunas de las técnicas basadas en la experimentación más adecuadas para este fin. Examinaremos el uso de experimentos aleatorizados tanto de laboratorio como de campo, experimentos virtuales o experimentos naturales. A su vez, abordamos también algunos problemas habituales en el trabajo experimental en políticas públicas, como la validez de los resultados o la ética. A pesar de los posibles inconvenientes, la incorporación de la lógica experimental al análisis de políticas públicas es en la actualidad el patrón de referencia en las principales instituciones internacionales, y así empieza a ser también en España. El manejo de las técnicas que aquí repasamos, así como entender su importancia y limitaciones, se hace imprescindible para cualquier persona interesada en el ámbito de las políticas públicas. Este artículo sirve de introducción y guía en ese aprendizaje. Añadiendo la experimentación a nuestros análisis, podremos obtener resultados robustos sobre los que construir futuras políticas más eficientes y efectivas
The current publication system in economics has encouraged the inflation of positive results in empirical papers. Registered Reports, also called Pre-Results Reviews, are a new submission format for empirical work that takes pre-registration one step further. In Registered Reports, researchers write their papers before running the study and commit to a detailed data collection process and analysis plan. After a first-stage review, a journal can give an In-Principle-Acceptance guaranteeing that the paper will be published if the authors carry out their data collection and analysis as pre-specified. We here propose a practical guide to Registered Reports for empirical economists. We illustrate the major problems that Registered Reports address (p-hacking, HARKing, forking, and publication bias), and present practical guidelines on how to write and review Registered Reports (e.g., the data-analysis plan, power analysis, and correction for multiple-hypothesis testing), with R and STATA codes. We provide specific examples for experimental economics, and show how research design can be improved to maximize statistical power. Last, we discuss some tools that authors, editors, and referees can use to evaluate Registered Reports (checklist, study-design table, and quality assessment).
This paper proposes an adaptive randomization procedure for two-stage randomized controlled trials. The method uses data from a first-wave experiment in order to determine how to stratify in a second wave of the experiment, where the objective is to minimize the variance of an estimator for the average treatment effect (ATE). We consider selection from a class of stratified randomization procedures which we call stratification trees: these are procedures whose strata can be represented as decision trees, with differing treatment assignment probabilities across strata. By using the first wave to estimate a stratification tree, we simultaneously select which covariates to use for stratification, how to stratify over these covariates, and the assignment probabilities within these strata. Our main result shows that using this randomization procedure with an appropriate estimator results in an asymptotic variance which is minimal in the class of stratification trees. Moreover, our results are able to accommodate a large class of assignment mechanisms within strata, including stratified block randomization. In a simulation study, we find that our method, paired with an appropriate cross-validation procedure, can improve on ad-hoc choices of stratification. We conclude by applying our method to the study in Karlan and Wood (2017), where we estimate stratification trees using the first wave of their experiment.
In randomized controlled trials, treatment is often assigned by stratified randomization. I show that among all stratified randomization schemes that treat all units with probability one half, a certain matched-pair design achieves the maximum statistical precision for estimating the average treatment effect. In an important special case, the optimal design pairs units according to the baseline outcome. In a simulation study based on datasets from ten randomized controlled trials, this design lowers the standard error for the estimator of the average treatment effect by 10 percent on average, and by up to 34 percent, relative to the original designs. (JEL C13, C21)
Experiments are a central methodology in the social sciences. Scholars from every discipline regularly turn to experiments. Practitioners rely on experimental evidence in evaluating social programs, policies, and institutions. This book is about how to “think” about experiments. It argues that designing a good experiment is a slow moving process (given the host of considerations) which is counter to the current fast moving temptations available in the social sciences. The book includes discussion of the place of experiments in the social science process, the assumptions underlying different types of experiments, the validity of experiments, the application of different designs, how to arrive at experimental questions, the role of replications in experimental research, and the steps involved in designing and conducting “good” experiments. The goal is to ensure social science research remains driven by important substantive questions and fully exploits the potential of experiments in a thoughtful manner.
Objectives
The paper studies the impact of predictive policing on crime in a developing country. It also assesses the impact of different police trainings.
Method
We analyze a randomized controlled trial conducted in Montevideo, Uruguay to assess the implementation of a predictive policing software developed in the United States. Half of the precincts were randomly assigned to the software and half to the local crime analysts (status quo). The second experiment allocated randomly a specially trained police force to targeted patrol areas per shift and day.
Results
No statistically significant differences were found in crime outcomes between the precincts assigned to the foreign predictive software and those assigned to local crime analysts. On the second experiment, given determined targeted places, the specially trained task force showed more compliance with the assigned patrol sites (20% more patrol time) and a greater potential for reducing crime (reduction of 30% in robberies only during high crime shifts in comparison to the control group (no special training). There is also evidence of a diffusion of benefits to adjacent areas.
Conclusions
The implementation of an international predictive policing software did not outperform local crime analysts in terms of crime reduction. Local crime analysts are more cost-effective. Given determined targeted places, a modest increase in police dosage of a specially trained police force could reduce crime in high-crime times. In developing countries new policing technologies and training require a deep understanding of the context to channel limited resources in the most efficient way.
Increasing incomes, improving the education system, reducing morbidity are important areas of impact investment. Whether these changes will actually be achieved is the key question for an investor deciding on a social technology or project to invest in. However, the leaders of social projects and programs often focus on measuring the immediate outputs rather than on assessing whether projects and programs have had the expected impact. In this article, we would like to highlight the experience of evaluating the impact of the Health Insurance Subsidy Program (HISP) and describe approaches that can be used to address this and other similar problems.
Researchers conducting randomised controlled trials (RCTs) of complex interventions face design and analytical challenges that are not fully addressed in existing guidelines. Further guidance is needed to help ensure that these trials of complex interventions are conducted to the highest scientific standards while maximising the evidence that can be extracted from each trial. The key challenge is how to manage the multiplicity of outcomes required for the trial while minimising false positive and false negative findings. To address this challenge, we formulate three principles to conduct RCTs: (1) outcomes chosen should be driven by the intent and programme theory of the intervention and should thus be linked to testable hypotheses; (2) outcomes should be adequately powered and (3) researchers must be explicit and fully transparent about all outcomes and hypotheses before the trial is started and when the results are reported. Multiplicity in trials of complex interventions should be managed through careful planning and interpretation rather than through post hoc analytical adjustment. For trials of complex interventions, the distinction between primary and secondary outcomes as defined in current guidelines does not adequately protect against false positive and negative findings. Primary outcomes should be defined as outcomes that are relevant based on the intervention intent and programme theory, declared (ie, registered), and adequately powered. The possibility of confirmatory causal inference is limited to these outcomes. All other outcomes (either undeclared and/or inadequately powered) are secondary and inference relative to these outcomes will be exploratory.
In the scope of explainable artificial intelligence, explanation techniques are heavily studied to increase trust in recommender systems. However, studies on explaining recommendations typically target adults in e-commerce or media contexts; e-learning has received less research attention. To address these limits, we investigated how explanations affect adolescents’ initial trust in an e-learning platform that recommends mathematics exercises with collaborative filtering. In a randomized controlled experiment with 37 adolescents, we compared real explanations with placebo and no explanations. Our results show that real explanations significantly increased initial trust when trust was measured as a multidimensional construct of competence, benevolence, integrity, intention to return, and perceived transparency. Yet, this result did not hold when trust was measured one-dimensionally. Furthermore, not all adolescents attached equal importance to explanations and trust scores were high overall. These findings underline the need to tailor explanations and suggest that dynamically learned factors may be more important than explanations for building initial trust. To conclude, we thus reflect upon the need for explanations and recommendations in e-learning in low-stakes and high-stakes situations.
This paper proposes a method to measure the effectiveness of an ethics program at one of the most prominent pawnshop chains in Mexico, surveying a sample of 519 workers. This research presents a novel approach to the investigation of business ethics by conducting a cluster randomized control trial experiment to assess effectiveness. No evidence of an enhanced understanding of the existing code of ethics from the communication and explanation of the code was apparent. This could indicate an example of a failed ethics program, suggesting the possibility of additional ineffective ethics programs and companies could be wasting resources on them. We demonstrate that it is possible to implement a cluster randomized control trial, which is considered to be the gold standard in impact evaluation. This should lead to the application of more effective methodologies in the field of business ethics, offering a more comprehensive understanding of the effectiveness of ethics programs.
Counterfactual assessment techniques involving treated and control groups, such as randomized control trials, might be used in outcome-based contracts to avoid rewarding or sanctioning service providers for social outcomes that they did not cause. However, few outcome-based contracts adopt payment rules based on counterfactual assessment techniques. Potential explanations are that these techniques are complex and involve substantial transaction costs. In this paper, we develop a theoretical formal model that integrates the literatures of incentives and policy evaluation to propose the following alternative explanation: counterfactual techniques may lead to counterproductive incentive effects if they reduce the likelihood of payment even if project managers exert sufficient effort to promote the expected interventions. Our model shows that counterfactual assessment may undermine effort when the number of treated subjects is small and there is limited investment per treated subject. Our formal model also suggests that the increased experience of the contract sponsors may inhibit the adoption of counterfactual assessment. Simulations and descriptive evidence from a unique database of 350 outcome-based contracts designed or initiated throughout the world and from linear probability models are aligned with our predictions. By offering additional explanations on why counterfactual assessment methods are not widespread in outcome-based contracts and by identifying the boundary conditions under which these methods are used in incentive contracts, this work informs the literature on cross-sector outcome-based contracts and illustrates the use of formal models to develop novel theories in public administration.
Background In recent years, public health policies and their effects on improving health outcomes have been gaining prominence in the economic literature and on the agenda of international organizations. Objective This study aims to evaluate the causal effect of the “Pacto pela Saúde” (Pact for Health) program on health policy performance in terms of a Health Vulnerability Index (HVI) of Brazilian municipalities from 2006 to 2013. The “Pacto pela Saúde” program is the current operational standard of the Brazilian Unified Health System (SUS). One of the main guidelines of this program was to improve health policy governance. Method The effect resulting from efficiency gains of the participation of municipalities in the health policy on the HVI was estimated by the Pearl’s Structural Causal Model. Results The results indicate a positive and significant impact of efficiency management on the reduction of health vulnerability in the municipalities. The Pearl’s Causal Model and the back-door criterion of causal identification were employed to calculate the effects of the “Pacto pela Saúde” program on the HVI. Conclusion The use of Pearl’s method in this study contributed to a more comprehensive analysis of the effects of the “Pacto pela Saúde” program on health outcomes and, therefore, its use in future research on the analysis of public policies is recommended.
Some transaction costs act to reduce producer incentives to be concerned about the quality of their agricultural products. We present a simple model that demonstrates how those attenuating effects can be reduced and are affected by unobservable factors among both producers and purchasers, particularly in a low trust environment. One way to address quality concerns is through third-party certification schemes, which typically involve either unobservable attributes about the product or the production process. However, these schemes are expensive and actors need to reap higher returns from their activities to make them work. Evaluating the impacts of certification schemes is tricky because farmers self-select into participation, and the poorest farmers do not participate. Present evidence, however, suggests these schemes do have positive income effects for participating farmers.
Agricultural value chains take on several different organizational forms, from being dominated completely by spot markets to being vertically integrated within a single company. We consider a conceptual model of factors leading to different value chain governance structures; then we adapt this model to African value chains by considering contextual factors, such as the abundance of smallholders and the fear that market power often resides with the trader in African value chains. We note that relational contracting plays a very important role in African value chains; transactions along value chains in Africa are typically based on implicit, self-enforcing contracts with little or no third-party enforcement. Transaction costs that lead to relational contracting simply reflect the economic and technological conditions at play.
This book provides a thorough introduction to and examination of agricultural value chains in Sub-Saharan Africa. First, the authors introduce the economic theory of agri-food value chains and value chain governance, focusing on domestic and regional trade in (and consumption of) food crops in a low-income country context. In addition to mainstream and heterodox thinking about value chain development, the book pays attention to political economy considerations. The book also reviews the empirical evidence on value chain development and performance in Africa. It adopts multiple lenses to examine agricultural value chains, zooming out from the micro level (e.g., relational contracting in a context of market imperfections) to the meso level (e.g., distributional implications of various value chain interventions, inclusion of specific social groups) and the macro level (underlying income, population and urbanization trends, volumes and prices, etc.).Furthermore, this book places value chain development in the context of a process the authors refer to as structural transformation 2.0, which refers to a process where production factors (labor, land and capital) move from low-productivity agriculture to high-productivity agriculture. Finally, throughout the book the authors interpret the evidence in light of three important debates: (i) how competitive are rural factor and product markets, and what does this imply for distribution and innovation? (ii) what role do foreign investment and factor proportions play in the development of agri-food value chains in Africa? (iii) what complementary government policies can help facilitate a process of agricultural value chain transformation, towards high-productive activities and enhancing the capacity of value chains to generate employment opportunities and food security for a growing population.
Alan de Brauw is a Senior Research Fellow at the International Food Policy Research Institute. He was previously a professor of economics at Williams College. He conducts much of his research using primary source data and has previously published over 50 articles in economics, agricultural economics, and nutrition journals.
Erwin Bulte is professor of development economics at Wageningen University and Research. He has previously held positions at Oxford University, Cambridge University, Tilburg University and Utrecht University. He has published almost 150 papers in internationally refereed journals, and a previous Palgrave book on institutions and agrarian development in West Africa (with Paul Richards and Maarten Voors).
Poor storage causes additional problems for smallholder farmers, as they are pressured to sell crops immediately after harvest. As a result, in Africa in many years prices for major grains fall right after harvest and peak just before the next one. Poor storage can also lead to post-harvest losses. Yet good post-harvest loss measurements are scarce, particularly for vegetables; since information on actual losses is poor, it is difficult to design cost-effective interventions to reduce them. With improved storage, farmers could reap returns to higher prices later in the season. More regional storage can also support warehouse receipts systems, which can be used both as collateral and to develop commodity exchanges. Yet again, transaction costs to using regional storage are high for smallholders.
Smallholder farmers in Africa are poor and appear unproductive relative to larger farmers. But once one takes their environment into account, we argue they make rational production decisions given their multiple objectives under the multiple constraints they face. These constraints are shaped by transaction costs, which determine what smallholders can buy and sell. Transaction costs include not just transporting goods to market, but also costs of aggregation, dealing with risk, obtaining liquidity, and costs related to trust, market power, and even storage. The remainder of the book, then, provides historical and institutional reasons why African smallholders face high transaction costs. After explaining why some solutions will likely fail, the book concludes with what we consider promising areas for interventions to catalyze Structural Transformation 2.0 in Africa.
Smallholder production in Africa tends to be both low yielding relative to the agronomic potential, and crops are of low or variable quality. These outcomes are largely a result of market conditions that smallholders face. Smallholders lack full property rights over land, and capital markets targeting smallholders are thin, so they may not be able to purchase enough inputs. Inputs are often costly, both because of relatively large distances inputs must travel, because farmers may lack information about the right amounts to use, and because they lack capital, reducing demand. And farmers may not trust inputs either, due to perceived counterfeiting or other risks. In selling on output markets, smallholders often face weak returns to quality due to imperfect competition. And even within households, these challenges can differ; women may face stricter constraints on their production than men do.
African agricultural value chains have gradually evolved from informal exchange to more formalization in general, yet this process has not been linear in time. Policy changes between colonial and post-colonial regimes first shifted at least some smallholders into more formalized markets, and then back to selling surplus on spot markets. The colonial era can be characterized as extractive; institutions were developed to extract value from Africa and provide cheap food to Europe, particularly tropical commodities. Many post-colonial governments continued to implicitly tax agriculture through urban bias and pricing, tariff, or exchange rate policies until structural adjustment occurred in the 1990s. Since then, several factors have improved African agricultural performance, including an infusion of FDI and private sector investments and changes in agricultural policy in Europe improving African terms of trade.
In 2014 the South African government implemented a youth employment tax incentive (YETI) scheme to address the high rate of youth unemployment. Its adoption has been hailed as a success story for evidence‐based policy. This article critically assesses that claim, focusing on the randomized trial of a wage‐subsidy voucher that was used to justify the adoption of the policy and econometric analyses of the incentive's efficacy that were used to justify its renewal. That evidence is shown to be materially flawed. The design of the randomized trial meant that its relevance to the policy question was limited and critical issues pertaining to the estimated effect of the intervention, external validity and scale‐up were not addressed. The process was similarly flawed, with evidence only made public after critical legislative decisions had been taken. Analysis of that process shows how supposedly rigorous evidence was used to obscure the limitations and risks of the proposal in service of pre‐existing positions, vested interests and political imperatives. This implies a high opportunity cost for the tax expenditures incurred through the incentive. The South African YETI thereby serves as a cautionary tale on randomized trials and the political economy of evidence‐based policy, particularly in developing countries.
COVID-19 continues to spread across the globe at an exponential speed, infecting millions and overwhelming even the most prepared healthcare systems. Concerns are looming that the healthcare systems in low-and middle-income countries (LMICs) are mostly unprepared to combat the virus because of limited resources. The problems in LMICs are exacerbated by the fact that citizens in these countries generally exhibit low trust in the healthcare system because of its low quality, which could trigger a number of uncooperative behaviors. In this paper, we focus on one such behavior and investigate the relationship between trust in the healthcare system and the probability of potential treatment-seeking behavior upon the appearance of the first symptoms of COVID-19. First, we provide motivating evidence from a unique national online survey administered in Armenia-a post-Soviet LMIC country. We then present results from a large-scale survey experiment in Armenia that provides causal evidence supporting the investigated relationship. Our main finding is that a more trustworthy healthcare system enhances the probability of potential treatment-seeking behavior when observing the initial symptoms.
Despite decades of investment in agricultural extension, technology adoption among farmers and agricultural productivity growth in Sub‐Saharan Africa remain slow. Among other shortcomings, extension systems often make recommendations that do not account for price risk or spatial heterogeneity in farmers' growing conditions. However, little is known about the effectiveness of extension approaches for nutrient management that consider these issues. We analyze the impact of farmers' access to site‐specific nutrient management recommendations and to information on expected returns, provided through a digital decision support tool, for maize production. We implement a randomized controlled trial among smallholders in the maize belt of northern Nigeria. We use three waves of annual panel data to estimate immediate and longer term effects of two different extension treatments: site‐specific recommendations with and without complementary information about variability in output prices and expected returns. We find that site‐specific nutrient management recommendations improve fertilizer management practices and maize yields but do not necessarily increase fertilizer use. In addition, we find that recommendations that are accompanied by additional information about variability in expected returns induce larger fertilizer investments that persist beyond the first year. However, the magnitudes of these effects are small: we find only incremental increases in investments and net revenues over two treatment years.
When governments and healthcare providers offer people cash rewards for weight loss, an assumption is that cash rewards are versatile, working equally well for everyone – for example, for all genders. No research to date has tested for gender difference in response to financial incentives for weight loss. We show in an randomized controlled trial (RCT) ( n = 472) that cash incentives for weight loss only worked for males. The RCT consisted of a 3-month, self-administered online weight loss program. Offering a US$150 incentive for a 5% weight loss more than tripled the proportion of males who were successful, compared with a no-incentive Control arm (20.9% vs. 5.9%). On average, males in the incentive arm lost 2.4% of weight over 3 months, compared with 0.9% in the Control arm. The same incentive had no such effect on females: The average weight loss in the incentive arm was not significantly different than in the Control (1.03% and 1.44%, respectively), nor was the proportion of participants meeting the 5% weight loss goal (8.6% and 8.7%, respectively). This study shows that males respond better than females to financial incentives for weight loss.
Some research suggests women are more likely to allocate additional resources to their children than are men. This perception has influenced policies such as in-kind food transfer programmes and cash transfer programmes, which often target women recipients. We assess whether targeting in-kind rice transfers to female versus male adult household members has a differential impact on children’s short-run nutritional status. We estimate the impacts of transfers of edible rice and rice seeds, randomly allocated to female or male adults, on three anthropometric indicators: BMI-for-age, arm-muscle area, and triceps skinfold thickness. The trial includes 481 children aged 3–11 years in a horticultural-foraging society of native Amazonians in Bolivia. On average, the gender of the transfer recipient does not influence child anthropometric dimensions, possibly due to norms of cooperation and sharing within and between households. We find limited evidence of heterogeneity in impacts. Transfers to women help children who were growth stunted at baseline to partially catch-up to their better-nourished age-sex peers and help boys (but not girls) and children in higher-income households increase their BMI-for-age. The results of this research point to the importance of considering cultural context in determining if allocating food transfers according to gender are most effective.
ResearchGate has not been able to resolve any references for this publication.