Article

The Science of Using Science: Towards an Understanding of The Threats To Scalability

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Policymakers are increasingly facing the challenge of scaling empirical insights. This study provides a theoretical lens into the science of how to use science. Through a simple model, we highlight three elements of the scale‐up problem: (1) when does evidence become actionable; (2) properties of the population; and (3) properties of the situation. Until these three areas are fully understood, the threats to scalability will render any scaling exercise as particularly vulnerable. Accordingly, our work represents a call for more policy‐based evidence, whereby the nature and extent of the various threats to scalability are explored in the original research program. This article is protected by copyright. All rights reserved

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
This paper combines two new summer youth employment experiments in Chicago and Philadelphia with previously published evidence to show how repeated study of an intervention as it scales and changes contexts can guide decisions about public investment. Two sources of treatment heterogeneity can undermine the scale-up and replication of successful human capital interventions: variation in the treatment itself and in individual responsiveness. Results show that these programs generate consistently large proportional decreases in criminal justice involvement, even as administrators recruit additional youth, hire new local providers, find more job placements, and vary the content of their programs. Using both endogeneous stratification within cities and variation in 62 new and existing point estimates across cities uncovers a key pattern of individual responsiveness: impacts grow linearly with the risk of socially costly behavior each person faces. Identifying more interventions that combine this pattern of robustness to treatment variation with bigger effects for the most disconnected could aid efforts to reduce social inequality efficiently.
Article
The promise of randomized controlled trials is that evidence gathered through the evaluation of a specific program helps us—possibly after several rounds of fine-tuning and multiple replications in different contexts—to inform policy. However, critics have pointed out that a potential constraint in this agenda is that results from small “proof-of-concept” studies run by nongovernment organizations may not apply to policies that can be implemented by governments on a large scale. After discussing the potential issues, this paper describes the journey from the original concept to the design and evaluation of scalable policy. We do so by evaluating a series of strategies that aim to integrate the nongovernment organization Pratham’s “Teaching at the Right Level” methodology into elementary schools in India. The methodology consists of reorganizing instruction based on children’s actual learning levels, rather than on a prescribed syllabus, and has previously been shown to be very effective when properly implemented. We present evidence from randomized controlled trials involving some designs that failed to produce impacts within the regular schooling system but still helped shape subsequent versions of the program. As a result of this process, two versions of the programs were developed that successfully raised children’s learning levels using scalable models in government schools. We use this example to draw general lessons about using randomized control trials to design scalable policies.
Article
Full-text available
Randomized Controlled Trials (RCTs) are increasingly popular in the social sciences, not only in medicine. We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of investigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justification is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding 'external validity' is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not 'what works', but 'why things work'.
Article
Full-text available
Objective: Clinical trials have long been considered the ‘gold standard’ of research generated evidence in health care. Patient recruitment is an important determinant in the success of the trials, yet little focus is placed on the decision making process of patients towards recruitment. Our objective was to identify the key factors pertaining to patient participation in clinical trials, to better understand the identified low participation rate of patients in one clinical research facility within Ireland. Design: Narrative literature review of studies focussing on factors which may act to facilitate or deter patient participation in clinical trials. Studies were identified from Medline, PubMed, Cochrane Library and CINAHL. Results: Sixty-one studies were included in the narrative review: Forty-eight of these papers focused specifically on the patient's perspective of participating in clinical trials. The remaining thirteen related to carers, family and health care professional perspectives of participation. The primary factor influencing participation in clinical trials amongst patients was related to personal factors and these were collectively associated with obtaining a form of personal gain through participation. Cancer was identified as the leading disease entity included in clinical trials followed by HIV and cardiovascular disease. Conclusion: The vast majority of literature relating to participation in clinical trials emanates predominantly from high income countries, with 63% originating from the USA. No studies for inclusion in this review were identified from low income or developing countries and therefore limits the generalizability of the influencing factors.
Article
Full-text available
A decade ago, the Society of Prevention Research (SPR) endorsed a set of standards for evidence related to research on prevention interventions. These standards (Flay et al., Prevention Science 6:151-175, 2005) were intended in part to increase consistency in reviews of prevention research that often generated disparate lists of effective interventions due to the application of different standards for what was considered to be necessary to demonstrate effectiveness. In 2013, SPR's Board of Directors decided that the field has progressed sufficiently to warrant a review and, if necessary, publication of "the next generation" of standards of evidence. The Board convened a committee to review and update the standards. This article reports on the results of this committee's deliberations, summarizing changes made to the earlier standards and explaining the rationale for each change. The SPR Board of Directors endorses "The Standards of Evidence for Efficacy, Effectiveness, and Scale-up Research in Prevention Science: Next Generation."
Article
Full-text available
To test the hypothesis that the percentage of patients screened that randomize differs between prevention and therapy trials. Rapid review of randomized controlled trials (RCTs) identified through published systematic reviews in August 2013. Individually randomized, parallel group controlled RCTs were eligible if they evaluated metformin monotherapy or exercise for the prevention or treatment of type 2 diabetes. Numbers of patients screened and randomized were extracted by a single reviewer. Percentages were calculated for each study for those randomized: as a function of those approached, screened, and eligible. Percentages (95% confidence intervals) from each individual study were weighted according to the denominator and pooled rates calculated. Statistical heterogeneity was assessed using I(2). The percentage of those screened who subsequently randomized was 6.2% (6.0%, 6.4%; 3 studies, I(2) = 100.0%) for metformin prevention trials; 50.7% (49.9%, 51.4%; 21 studies, I(2) = 99.6%) for metformin treatment trials; 4.8% (4.7%, 4.8%; 14 studies, I(2) = 99.9%) for exercise prevention trials; and 43.3% (42.6%, 43.9%; 28 studies, I(2) = 99.8%) for exercise treatment trials. This study provides qualified support for the hypothesis that prevention trials recruit a smaller proportion of those screened than treatment trials. Statistical heterogeneity associated with pooled estimates and other study limitations is discussed. Copyright © 2014 Elsevier Inc. All rights reserved.
Article
Full-text available
Experimental economics represents a strong growth industry. In the past several decades the method has expanded beyond intellectual curiosity, now meriting consideration alongside the other more traditional empirical approaches used in economics. Accompanying this growth is an influx of new experimenters who are in need of straightforward direction to make their designs more powerful. This study provides several simple rules of thumb that researchers can apply to improve the efficiency of their experimental designs. We buttress these points by including empirical examples from the literature.
Article
Full-text available
Summary The current system of publication in biomedical research provides a distorted view of the reality of scientific data that are generated in the laboratory and clinic. This system can be studied by applying principles from the field of economics. The "winner's curse," a more general statement of publication bias, suggests that the small proportion of results chosen for publication are unrepresentative of scientists' repeated samplings of the real world. The self-correcting mechanism in science is retarded by the extreme imbalance between the abundance of supply (the output of basic science laboratories and clinical investigations) and the increasingly limited venues for publication (journals with sufficiently high impact). This system would be expected intrinsically to lead to the misallocation of resources. The scarcity of available outlets is artificial, based on the costs of printing in an electronic age and a belief that selectivity is equivalent to quality. Science is subject to great uncertainty: we cannot be confident now which efforts will ultimately yield worthwhile achievements. However, the current system abdicates to a small number of intermediates an authoritative prescience to anticipate a highly unpredictable future. In considering society's expectations and our own goals as scientists, we believe that there is a moral imperative to reconsider how scientific data are judged and disseminated.
Article
Full-text available
We describe the use of a conceptual framework and implementation protocol to prepare effective health services interventions for implementation in community-based (i.e., non-academic-affiliated) settings. The framework is based on the experiences of the U.S. Centers for Disease Control and Prevention (CDC) Replicating Effective Programs (REP) project, which has been at the forefront of developing systematic and effective strategies to prepare HIV interventions for dissemination. This article describes the REP framework, and how it can be applied to implement clinical and health services interventions in community-based organizations. REP consists of four phases: pre-conditions (e.g., identifying need, target population, and suitable intervention), pre-implementation (e.g., intervention packaging and community input), implementation (e.g., package dissemination, training, technical assistance, and evaluation), and maintenance and evolution (e.g., preparing the intervention for sustainability). Key components of REP, including intervention packaging, training, technical assistance, and fidelity assessment are crucial to the implementation of effective interventions in health care. REP is a well-suited framework for implementing health care interventions, as it specifies steps needed to maximize fidelity while allowing opportunities for flexibility (i.e., local customizing) to maximize transferability. Strategies that foster the sustainability of REP as a tool to implement effective health care interventions need to be developed and tested.
Article
The promise of randomized controlled trials is that evidence gathered through the evaluation of a specific program helps us—possibly after several rounds of fine-tuning and multiple replications in different contexts—to inform policy. However, critics have pointed out that a potential constraint in this agenda is that results from small “proof-of-concept” studies run by nongovernment organizations may not apply to policies that can be implemented by governments on a large scale. After discussing the potential issues, this paper describes the journey from the original concept to the design and evaluation of scalable policy. We do so by evaluating a series of strategies that aim to integrate the nongovernment organization Pratham’s “Teaching at the Right Level” methodology into elementary schools in India. The methodology consists of reorganizing instruction based on children’s actual learning levels, rather than on a prescribed syllabus, and has previously been shown to be very effective when properly implemented. We present evidence from randomized controlled trials involving some designs that failed to produce impacts within the regular schooling system but still helped shape subsequent versions of the program. As a result of this process, two versions of the programs were developed that successfully raised children’s learning levels using scalable models in government schools. We use this example to draw general lessons about using randomized control trials to design scalable policies.
Article
This article provides a brief history of evidence-based policy, which it defines as encompassing (1) the application of rigorous research methods, particularly randomized controlled trials (RCTs), to build credible evidence about “what works” to improve the human condition; and (2) the use of such evidence to focus public and private resources on effective interventions. Evidence-based policy emerged first in medicine after World War II, and has made tremendous contributions to human health. In social policy, a few RCTs were conducted before 1980, but the number grew rapidly in U.S. welfare and employment programs during the 1980s and 1990s and had an important impact on government policy. Since 2000, evidence-based policy has seen a major expansion in other social policy areas, including education and international development assistance. A recent milestone is the U.S. enactment of “tiered evidence” social programs in which rigorous evidence is the defining principle in awarding government funding for interventions.
Article
This paper makes the case for greater use of randomized experiments “at scale.” We review various critiques of experimental program evaluation in developing countries, and discuss how experimenting at scale along three specific dimensions—the size of the sampling frame, the number of units treated, and the size of the unit of randomization— can help alleviate the concerns raised. We find that program-evaluation randomized controlled trials published over the last 15 years have typically been “small” in these senses, but also identify a number of examples—including from our own work— demonstrating that experimentation at much larger scales is both feasible and valuable.
Article
Economists often conduct experiments that demonstrate the benefits to individuals of modifying their behavior, such as using a new production process at work or investing in energy saving technologies. A common occurrence is for the success of the intervention in these small-scale studies to diminish substantially when applied at a larger scale, severely undermining the optimism advertised in the original research studies. One key contributor to the lack of general success is that the change that has been demonstrated to be beneficial is not adopted to the extent that would be optimal. This problem is isomorphic to the problem of patient non-adherence to medications that are known to be effective. The large medical literature on countermeasures furnishes economists with potential remedies to this manifestation of the scaling problem.
Article
Purpose Whether the ASCO Value Framework and the European Society for Medical Oncology (ESMO) Magnitude of Clinical Benefit Scale (MCBS) measure similar constructs of clinical benefit is unclear. It is also unclear how they relate to quality-adjusted life-years (QALYs) and funding recommendations in the United Kingdom and Canada. Methods Randomized clinical trials of oncology drug approvals by the US Food and Drug Administration, European Medicines Agency, and Health Canada between 2006 and August 2015 were identified and scored using the ASCO version 1 (v1) framework, ASCO version 2 (v2) framework, and ESMO-MCBS by at least two independent reviewers. Spearman correlation coefficients were calculated to assess construct (between frameworks) and criterion validity (against QALYs from the National Institute for Health and Care Excellence [NICE] and the pan-Canadian Oncology Drug Review [pCODR]). Associations between scores and NICE/pCODR recommendations were examined. Inter-rater reliability was assessed using intraclass correlation coefficients. Results From 109 included randomized clinical trials, 108 ASCOv1, 111 ASCOv2, and 83 ESMO scores were determined. Correlation coefficients for ASCOv1 versus ESMO, ASCOv2 versus ESMO, and ASCOv1 versus ASCOv2 were 0.36 (95% CI, 0.15 to 0.54), 0.17 (95% CI, −0.06 to 0.37), and 0.50 (95% CI, 0.35 to 0.63), respectively. Compared with NICE QALYs, correlation coefficients were 0.45 (ASCOv1), 0.53 (ASCOv2), and 0.46 (ESMO); with pCODR QALYs, coefficients were 0.19 (ASCOv1), 0.20 (ASCOv2), and 0.36 (ESMO). None of the frameworks were significantly associated with NICE/pCODR recommendations. Inter-rater reliability was good for all frameworks. Conclusion The weak-to-moderate correlations of the ASCO frameworks with the ESMO-MCBS, as well as their correlations with QALYs and with NICE/pCODR funding recommendations, suggest different constructs of clinical benefit measured. Construct convergent validity with the ESMO-MCBS did not increase with the updated ASCO framework.
Article
Policymakers often consider interventions at the scale of the population, or some other large scale. One of the sources of information about the potential effects of such interventions is experimental studies conducted at a significantly smaller scale. A common occurrence is for the treatment effects detected in these small-scale studies to diminish substantially in size when applied at the larger scale that is of interest to policymakers. This paper provides an overview of the main reasons for a breakdown in scalability. Understanding the principal mechanisms represents a first step toward formulating countermeasures that promote scalability.
Article
The promise of randomized controlled trials (RCTs) is that evidence gathered through the evaluation of a specific program helps us—possibly after several rounds of fine-tuning and multiple replications in different contexts—to inform policy. However, critics have pointed out that a potential constraint in this agenda is that results from small, NGO-run “proof-of-concept” studies may not apply to policies that can be implemented by governments on a large scale. After discussing the potential issues, this paper describes the journey from the original concept to the design and evaluation of scalable policy. We do so by evaluating a series of strategies that aim to integrate the NGO Pratham’s “Teaching at the Right Level” methodology into elementary schools in India. The methodology consists of re-organizing instruction based on children’s actual learning levels, rather than on a prescribed syllabus, and has previously been shown to be very effective when properly implemented. We present RCT evidence on the designs that failed to produce impacts within the regular schooling system but helped shape subsequent versions of the program. As a result of this process, two versions of the programs were developed that successfully raised children’s learning levels using scalable models in government schools.
Book
Few forms of market exchange intrigue economists as do auctions, whose theoretical and practical implications are enormous. John Kagel and Dan Levin, complementing their own distinguished research with papers written with other specialists, provide a new focus on common value auctions and the "winner's curse." In such auctions the value of each item is about the same to all bidders, but different bidders have different information about the underlying value. Virtually all auctions have a common value element; among the burgeoning modern-day examples are those organized by Internet companies such as eBay. Winners end up cursing when they realize that they won because their estimates were overly optimistic, which led them to bid too much and lose money as a result. The authors first unveil a fresh survey of experimental data on the winner's curse. Melding theory with the econometric analysis of field data, they assess the design of government auctions, such as the spectrum rights (air wave) auctions that continue to be conducted around the world. The remaining chapters gauge the impact on sellers' revenue of the type of auction used and of inside information, show how bidders learn to avoid the winner's curse, and present comparisons of sophisticated bidders with college sophomores, the usual guinea pigs used in laboratory experiments. Appendixes refine theoretical arguments and, in some cases, present entirely new data. This book is an invaluable, impeccably up-to-date resource on how auctions work--and how to make them work.
Article
Some researchers have argued that anchoring in economic valuations casts doubt on the assumption of consistent and stable preferences. We present new evidence that explores the strength of certain anchoring results. We then present a theoretical framework that provides insights into why we should be cautious of initial empirical findings in general. The model importantly highlights that the rate of false positives depends not only on the observed significance level, but also on statistical power, research priors, and the number of scholars exploring the question. Importantly, a few independent replications dramatically increase the chances that the original finding is true.
Article
The revised Society for Prevention Research (SPR) standards of evidence are an exciting advance in the field of prevention science. We appreciate the committee's vision that the standards represent goals to aspire to rather than a set of benchmarks for where prevention science is currently. The discussion about the standards highlights how much has changed in the field over the last 10 years and as knowledge, theory, and methods continue to advance, the new standards push the field toward increasing rigor and relevance. This commentary discusses how the revised standards support work of translating high-quality evaluations to support evidence-based policy and work supporting evidence-based programs' ability to implement at scale. The commentary ends by raising two areas, generating evidence at scale and transparency of research, as additional areas for consideration in future standards.
Article
“Site selection bias” can occur when the probability that a program is adopted or evaluated is correlated with its impacts. I test for site selection bias in the context of the Opower energy conservation programs, using 111 randomized control trials involving 8.6 million households across the United States. Predictions based on rich microdata from the first 10 replications substantially overstate efficacy in the next 101 sites. Several mechanisms caused this positive selection. For example, utilities in more environmentalist areas are more likely to adopt the program, and their customers are more responsive to the treatment. Also, because utilities initially target treatment at higher-usage consumer subpopulations, efficacy drops as the program is later expanded. The results illustrate how program evaluations can still give systematically biased out-of-sample predictions, even after many replications. JEL Codes: C93, D12, L94, O12, Q41.
Article
A commonly held view is that laboratory experiments provide researchers with more "control" than natural field experiments. This paper explores how natural field experiments can provide researchers with more control than laboratory experiments. While laboratory experiments provide researchers with a high degree of control in the environment which participants agree to be experimental subjects, when participants systematically opt out of laboratory experiments, the researcher's ability to manipulate certain variables is limited. In contrast, natural field experiments bypass the participation decision altogether due to their covertness, and they allow for a potentially more diverse participant pool within the market of interest.
Article
Nonadherence to prescription medication is common and costly.1 On average, 50% of medications for chronic diseases are not taken as prescribed.2 Medication nonadherence is widespread, and accountability for this issue is shared by patients, their caregivers, clinicians, and the health care system as a whole. Furthermore, there is an increasing business case for addressing medication nonadherence; as payment and delivery system models evolve to place health care organizations and clinicians at risk for patient outcomes and downstream costs (eg, bundled payments and accountable care organizations), interest in coordination of care and invention of durable treatments continues to increase.
Article
Domestic attempts to use financial incentives for teachers to increase student achievement have been ineffective. In this paper, we demonstrate that exploiting the power of loss aversion—teachers are paid in advance and asked to give back the money if their students do not improve sufficiently—increases math test scores between 0.201 (0.076) and 0.398 (0.129) standard deviations. This is equivalent to increasing teacher quality by more than one standard deviation. A second treatment arm, identical to the loss aversion treatment but implemented in the standard fashion, yields smaller and statistically insignificant results. This suggests it is loss aversion, rather than other features of the design or population sampled, that leads to the stark differences between our findings and past research.
Article
Research Findings: Early Head Start home-based programs provide services through weekly home visits to families with children up to age 3, but families vary in how long they remain enrolled. In this study of 564 families in home-based Early Head Start programs, “dropping out” was predicted by specific variations in home visits and certain family characteristics. It also was negatively related to several targeted program outcomes. Home visits to dropout families focused less on child development, were less successful at engaging parents, and had more distractions. Dropout families had more risks and changes of residence, were more likely to be headed by a single mother, and were less likely to have a mother with poor English skills or a child with a documented disability. Practice or Policy: Home visiting programs may be able to reduce dropout rates, and thereby increase the duration of services to each family, by keeping home visits engaging and focused on child development and also by individualizing to the specific needs of families at risk for dropping out. To keep families involved longer, home visiting programs should consider (a) planning home visits that are longer, more engaging for both parent and child, scheduled at a time when there are fewer distractions for the family; and (b) spending the majority of time on child development activities and topics.
Article
This study presents an overview of modern field experiments and their usage in economics. Our discussion focuses on three distinct periods of field experimentation that have influenced the economics literature. The first might well be thought of as the dawn of "field" experimentation: the work of Neyman and Fisher, who laid the experimental foundation in the 1920s and 1930s by conceptualizing randomization as an instrument to achieve identification via experimentation with agricultural plots. The second, the large-scale social experiments conducted by government agencies in the mid-twentieth century, moved the exploration from plots of land to groups of individuals. More recently, the nature and range of field experiments has expanded, with a diverse set of controlled experiments being completed outside of the typical laboratory environment. With this growth, the number and types of questions that can be explored using field experiments has grown tremendously. After discussing these three distinct phases, we speculate on the future of field experimental methods, a future that we envision including a strong collaborative effort with outside parties, most importantly private entities.
Article
This article is a literature summary and annotated bibliography of research on recruitment for controlled clinical trials published through 1995. It extends and revises a similar review published in this journal a decade ago. The current commentary focuses on intervening developments in recruitment, including diverse populations, HIV trials, primary prevention trials, recruitment strategies, overall planning and management, patient and physician attitudes, adherence, generalizability, and cost. Profound barriers may exist in the recruitment of diverse populations, involving language, cultural factors, beliefs about medical research, and the appropriateness of available protocols. Extensive literature exists on patient and physician barriers to participation. Trials in HIV-infected or AIDs-diagnosed individuals introduce special considerations, including issues of confidentiality, parallel track design, and populations difficult to define and track. Recruitment strategies such as patient registries, occupational screening, direct mail, and the media are now prominent in the literature. Successful planning and management of an overall recruitment plan include piloting strategies, monitoring recruitment by data tracking systems, and hiring quality staff. Generalizability of study results is influenced by the characteristics of participants and by their adherence to study protocol. With increasingly limited funding to conduct clinical trials, efforts to quantify and reduce recruitment costs are being made. While over 4000 titles were identified, primarily by MEDLINE literature search, the articles summarized emphasize data-supported and -confirmed conclusions, and broad coverage of disease areas. We annotate here 91 outstanding articles useful for formulation of overall recruitment approaches in clinical trials.
Article
About half of 2,581 low-income mothers reported reading daily to their children. At 14 months, the odds of reading daily increased by the child being firstborn or female. At 24 and 36 months, these odds increased by maternal verbal ability or education and by the child being firstborn or of Early Head Start status. White mothers read more than did Hispanic or African American mothers. For English-speaking children, concurrent reading was associated with vocabulary and comprehension at 14 months, and with vocabulary and cognitive development at 24 months. A pattern of daily reading over the 3 data points for English-speaking children and daily reading at any 1 data point for Spanish-speaking children predicted children's language and cognition at 36 months. Path analyses suggest reciprocal and snowballing relations between maternal bookreading and children's vocabulary.
Article
There has been a dramatic increase in the use of experimental methods in the past two decades. An oft-cited reason for this rise in popularity is that experimental methods provide the necessary control to estimate treatment effects in isolation of other confounding factors. We examine the relevance of experimental findings from laboratory settings that abstract from the field context of the task that theory purports to explain. Using common value auction theory as our guide, we identify naturally occurring settings in which one can test the theory. In our treatments the subjects are not picked at random, as in lab experiments with student subjects, but are deliberately identified by their trading roles in the natural field setting. We find that experienced agents bidding in familiar roles do not fall prey to the winner's curse. Yet, when experienced agents are observed bidding in an unfamiliar role, we find that they frequently fall prey to the winner's curse. We conclude that the theory predicts field behavior well when one is able to identify naturally occurring field counterparts to the key theoretical conditions.
Article
This paper investigates four topics. (1) It examines the different roles played by the propensity score (probability of selection) in matching, instrumental variable and control functions methods. (2) It contrasts the roles of exclusion restrictions in matching and selection models. (3) It characterizes the sensitivity of matching to the choice of conditioning variables and demonstrates the greater robustness of control function methods to misspecification of the conditioning variables. (4) It demonstrates the problem of choosing the conditioning variables in matching and the failure of conventional model selection criteria when candidate conditioning variables are not exogenous.
The Economics of Scale-Up ” Working Paper 23925
  • J M J Davis
  • K Guryan
  • Hallberg
  • Ludwig
Opportunities and Challenges in Evidence-Based Social Policy
  • L Supplee
  • A Metz
The Economics of Scale-Up
  • J M Davis
  • J Guryan
  • K Hallberg
  • J Ludwig