ArticlePDF Available

Abstract

Purpose This paper aims to present the Difference-in-Differences (DiD) method in an accessible language to a broad research audience from a variety of management-related fields. Design/methodology/approach The paper describes the DiD method, starting with an intuitive explanation, goes through the main assumptions and the regression specification and covers the use of several robustness methods. Recurrent examples from the literature are used to illustrate the different concepts. Findings By providing an overview of the method, the authors cover the main issues involved when conducting DiD studies, including the fundamentals as well as some recent developments. Originality/value The paper can hopefully be of value to a broad range of management scholars interested in applying impact evaluation methods.
Impact evaluation using
Dierence-in-Dierences
Anders Fredriksson and Gustavo Magalhães de Oliveira
Center for Organization Studies (CORS), School of Economics, Business and
Accounting (FEA), University of São Paulo (USP), São Paulo, Brazil
Abstract
Purpose This paper aims to present the Difference-in-Differences (DiD) method in an accessible language
to a broad research audience from a variety of management-related elds.
Design/methodology/approach The paper describes the DiD method, starting with an intuitive
explanation, goes through the main assumptions and the regression specication and covers the use of several
robustness methods. Recurrent examples from the literature are used to illustrate the different concepts.
Findings By providing an overview of the method, the authors cover the main issues involved when
conducting DiD studies, including the fundamentals as well as some recent developments.
Originality/value The paper can hopefully be of value to a broad range of management scholars
interested in applying impact evaluation methods.
Keywords Impact evaluation, Policy evaluation, Management, Causal effects,
Difference-in-Differences, Parallel trends assumption
Paper type Research paper
1. Introduction
Difference-in-Differences (DiD) is one of the most frequently used methods in impact
evaluation studies. Based on a combination of before-after and treatment-control group
comparisons, the method has an intuitive appeal and has been widely used in economics,
public policy, health research, management and other elds. After the introductory section,
this paper outlines the method, discusses its main assumptions, then provides further details
and discusses potential pitfalls. Examples of typical DiD evaluations are referred to
throughout the text, and a separate section discusses a few papers from the broader
management literature. Conclusions are also presented.
Differently from the case of randomized experiments that allow for a simple comparison
of treatment and control groups, DiD is an evaluation method used in non-experimental
settings. Other members of this familyare matching, synthetic control and regression
discontinuity. The goal of these methods is to estimate the causal effects of a program when
treatment assignment is non-random; hence, there is no obvious control group[1]. Although
© Anders Fredriksson and Gustavo Magalhães de Oliveira. Published in RAUSP Management
Journal. Published by Emerald Publishing Limited. This article is published under the Creative
Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create
derivative works of this article (for both commercial and non-commercial purposes), subject to full
attribution to the original publication and authors. The full terms of this licence may be seen at http://
creativecommons.org/licences/by/4.0/legalcode
Anders Fredriksson and Gustavo Magalhães de Oliveira contributed equally to this paper.
The authors thank the editor, two anonymous referees and Pamela Campa, Maria Perrotta Berlin
and Carolina Segovia for feedback that improved the paper. Any errors are our own.
Impact
evaluation
519
Received 18 May 2019
Revised 27 July2019
Accepted 8 August2019
RAUSP Management Journal
Vol. 54 No. 4, 2019
pp. 519-532
Emerald Publishing Limited
2531-0488
DOI 10.1108/RAUSP-05-2019-0112
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2531-0488.htm
random assignment of treatment is prevalent in medical studies and has become more
common also in the social sciences, through e.g. pilot studies of policy interventions, most
real-life situations involve non-random assignment. Examples include the introduction of
new laws, government policies and regulation[2]. When discussing different aspects of the
DiD method, a much researched 2006 healthcare reform inMassachusetts, that aimed to give
nearly all residents healthcare coverage, will be used as an example of a typical DiD study
object. In order to estimate the causal impact of this and other policies, a key challenge is to
nd a proper control group.
In the Massachusetts example, one could use as control a state that did not implement the
reform. A DiD estimate of reform impact can then be constructed, which in its simplest form
is equivalent to calculating the after-before difference in outcomes in the treatment group,
and subtracting from this difference the after-before difference in the control group. This
double difference can be calculated whenever treatment and control group data on the
outcomes of interest exist before and after the policy intervention. Having such data is thus
a prerequisite to apply DiD. As will be detailed below, however, fullling this criterion does
not imply that the method is always appropriate or that it will give an unbiased estimate of
the causal effect.
Labor economists were among the rst to apply DiD methods[3]. Ashenfelter (1978)
studied the effect of training programs on earnings and Card (1990) studied labor market
effects in Miami after a (non-anticipated) inux of Cuban migrants. As a control group, Card
used other US cities, similar to Miami along some characteristics, but without the migration
inux. Card & Krueger (1994) studied the impact of a New Jersey rise in the minimum wage
on employment in fast-food restaurants. Neighboring Pennsylvania maintained its
minimum wage and was used as control. Many other studies followed.
Although the basic method has not changed, several issues have been brought forward
in the literature, and academic studies have evolved along with these developments. Two
non-technical references covering DiD are Gertler, Martinez, Premand, Rawlings, and
Vermeersch (2016) and White & Raitzer (2017), whereas Angrist & Pischke (2009, chapter 5)
and Wooldridge (2012, chapter 13) are textbook references. In chronological order, Angrist
and Krueger (1999),Bertrand, Duo, and Mullainathan (2004),Blundell & Costa Dias (2000,
2009), Imbens & Wooldridge (2009),Lechner (2011),Athey & Imbens (2017),Abadie &
Cattaneo (2018) and Wing, Simon, and Bello-Gomez (2018) also review the method, including
more technical content. The main issues brought forward in these works and in other
references are discussed below.
2. The Dierence-in-Dierences method
The DiD method combines insights from cross-sectional treatment-control comparisons and
before-after studies for a more robust identication. First consider an evaluation that seeks
to estimate the effect of a (non-randomly implemented) policy (treatment) by comparing
outcomes in the treatment group to a control group, with data from after the policy
implementation. Assume there is a difference in outcomes. In the Massachusetts health
reform example, perhaps health is better in the treatment group. This difference may be due
to the policy, but also because there are key characteristics that differ between the groups
and that are determinants of the outcomes studied, e.g. income in the health reform example:
Massachusetts is relatively rich, and wealthier people on average have better health. A
remedy for this situation is to evaluate the impact of the policy after controlling for the
factors that differ between the two groups. This is only possible for observable
characteristics, however. Perhaps important socioeconomic and other characteristics that
determine outcomes are not in the dataset, or even fundamentally unobservable. And even if
RAUSP
54,4
520
it would be possible to collect additional data for certain important characteristics, the
knowledge about which are all the relevant variables is imperfect. Controlling for all
treatment-control group differences is thus difcult.
Consider instead a before-after study, with data from the treatment group. The policy
under study is implemented between the before and after periods. Assume a change over
time is observed in the outcome variables of interest, such as better health. In this case, the
change may have been caused by the policy, but may also be due to other changes that
occurred at the same time as the policy was implemented. Perhaps there were other relevant
government programs during the time of the study, or the general health status is changing
over time. With treatment group data only, the change in the outcome variables may be
incorrectly attributed to the intervention under study.
Now consider combining the after-before approach and the treatment-control group
comparison. If the after-before difference in the control group is deducted from the same
difference in the treatment group, two things are achieved. First, if other changes that occur
over time are also present in the control group, then these factors are controlled for when
the control group after-before difference is netted out from the impact estimate. Second, if
there are important characteristics that are determinants of outcomes and that differ
between the treatment and control groups, then, as long as these treatment-control group
differences are constant over time, their inuence is eliminated by studying changes over
time. Importantly, this latter point applies also to treatment-control group differences in
time-invariant unobservable characteristics (as they are netted out). It is thus possible to get
around the problem, present in cross-sectional studies, that one cannot control for
unobservable factors (further discussed below).
To formalize some of what has been said above, the basic DiD study has data from two
groups and two time periods, and the data is typically at the individual level, that is, at a
lower level than the treatment intervention itself. The data can be repeated cross-sectional
samples of the population concerned (ideally random draws) or a panel. Wooldridge (2012,
chapter 13) gives examples of DiD studies using the two types of data structures and
discusses the potential advantages of having a panel rather than repeated cross sections
(also refer to Angrist & Pischke, 2009, chapter 5; and Lechner, 2011).
With two groups and two periods, and with a sample of data from the population of
interest, the DiD estimate of policy impact can be written as follows:
DiD ¼ys¼Treatment;t¼After ys¼Treatment;t¼Before

ys¼Control;t¼After ys¼Control;t¼Before

(1)
where yis the outcome variable, the bar represents the average value (averaged over
individuals, typically indexed by i), the group is indexed by s(because in many studies,
policies are implemented at the state level) and tis time. With before and after data for
treatment and control, the data is thus divided into the four groups and the above double
difference is calculated. The information is typically presented in a 2 2 table, then a third
row and a third column are added in order to calculate the after-before and treatment-control
differences and the DiD impact measure. Figure 1 illustrates how the DiD estimate is
constructed.
The above calculation and illustration say nothing about the signicance level of the DiD
estimate, hence regression analysis is used. In an OLS framework, the DiD estimate is
obtained as the
b
-coefcient in the following regression, in which A
s
are treatment/control
group xed effects, B
t
before/after xed effects, I
st
is a dummy equaling 1 for treatment
observations in the after period (otherwise it is zero) and
«
ist
the error term[4]:
Impact
evaluation
521
yist ¼AsþBtþ
b
Ist þ
«
ist (2)
In order to verify that the estimate of
b
will recover the DiD estimate in (1), use (2) to get
Ey
istjs¼Control;t¼Before
ðÞ
¼AControl þBBefore
Ey
istjs¼Control;t¼After
ðÞ
¼AControl þBAfter
Ey
istjs¼Treatment;t¼Before
ðÞ
¼ATreatment þBBefore
Ey
istjs¼Treatment;t¼After
ðÞ
¼ATreatment þBAfter þ
b
In these expressions, E(y
ist
|s,t) is the expected value of y
ist
in population subgroup (s,t),
which is estimated by the sample average ys;t. Estimating (2) and plugging in the sample
counterpart of the above expressions into (1), with the hat notation representing coefcient
estimates, gives DiD ¼^
b
[5].
The DiD model is not limited to the 2 2 case, and expression 2 is written in a more
general form than what was needed so far. For models with several treatment- and/or
control groups, A
s
stands for xed effects for each of the different groups. Similarly, with
several before- and/or after periods, each period has its own xed effect, represented by B
t
.If
the reform is implemented in all treatment groups/states at the same time, I
st
switches from
zero to one in all such locations at the same time. In the general case, however, the reform is
staggered and hence implemented in different treatment groups/states sat different times t.
I
st
then switches from 0 to 1 accordingly. All these cases are covered by expression 2[6].
Individual-level control variables X
ist
can also be added to the regression, which
becomes:
yist ¼AsþBtþcXist þ
b
Ist þ
«
ist (3A)
An important aspect of DiD estimation concerns the data used. Although it cannot be done
with a 2 2 specication (as there would be four observations only), models with many time
periods and treatment/control groups can also be analyzed with state-level (rather than
individual-level) data (e.g. US or Brazilian data, with 50 and 27 states, respectively). There
would then be no i-index in regression 3 A. Perhaps the relevant data is at the state level (e.g.
unemployment rates from statistical institutes). Individual-level observations can also be
Figure 1.
Illustration of the
two-group two-period
DiD estimate. The
assumed treatment
group counterfactual
equals the treatment
group pre-reform
value plus the after-
before difference from
the control group
RAUSP
54,4
522
aggregated. An advantage of the latter approach is that one avoids the problem (discussed
in Section 4) that the within group-period (e.g. state-year) error terms tend to be correlated
across individuals, hence standard errors should be corrected. With either type of data, also
state-level control variables, Z
st
, may be included in expression 3A[7]. A more general form
of the regression specication, with individual-level data, becomes:
yist ¼AsþBtþcXist þdZst þ
b
Ist þ
«
ist (3B)
3. Parallel trends and other assumptions
Estimation of DiD models hinges upon several assumptions, which are discussed in detail
by Lechner (2011). The following paragraphs are mainly dedicated to the parallel trends
assumption, the discussion of which is a requirement for any DiD paper (no pre-treatment
effectsand common supportare also discussed below). Another important assumption is
the Stable Unit Treatment Value Assumption, which implies that there should be no
spillover effects between the treatment and control groups, as the treatment effect would
then not be identied (Duo, Glennerster, & Kremer, 2008). Furthermore, the control
variables X
ist
and Z
st
should be exogenous, unaffected by the treatment. Otherwise,
^
b
will
be biased. A typical approach is to use covariates that predate the intervention itself,
although this does not fully rule out endogeneity concerns, as there may be anticipation
effects. In some DiD studies and data sets, the controls may be available for each time period
(as suggested by the t-index on X
ist
and Z
st
), which is ne as long as they are not affected by
the treatment. Implied by the assumptions is that there should be no compositional changes
over time. An example would be if individuals with poor health move to Massachusetts
(from a control state to the treatment state). The health reform impact would then likely be
underestimated.
Identication basedon DiD relies on the parallel trendsassumption, which states thatthe
treatment group, absent the reform, would have followed the same time trend as the control
group (for the outcome variable of interest). Observable and unobservable factors may cause
the level of the outcome variable to differ between treatment and control, but this difference
(absent the reform in the treatment group) must be constant over time. Because the
treatment group is only observed as treated, the assumption is fundamentally untestable.
One can lend support to the assumption, however, through the use of several periods of pre-
reform data, showing that the treatment and control groups exhibit a similar pattern in pre-
reform periods. If such is the case, the conclusion that the impact estimated comes from the
treatment itself, and not from a combination of other sources (including those causing the
different pre-trends), becomes more credible. Pre-trends cannot be checked in a dataset with
one before-period only, however (Figure 1). In general, such studies are therefore less robust.
A certain number of pre-reform periods is highly desirable and certainly a recommended
best practicein DiD studies.
The papers on the New Jersey minimum wage increase by Card & Krueger (1994,2000)
(the rst referred to in Section 1) illustrate this contention and its relevance. The 1994 paper
uses a two-period dataset, February 1992 (before) and November 1992 (after). By using DiD,
the paper implicitly assumes parallel trends. The authors conclude that the minimum wage
increase had no negative effect on fast-food restaurant employment. In the 2000 paper, the
authors have access to additional data, from 1991 to 1997. In a graph of employment over
time, there is little visual support for the parallel trends assumption. The extended dataset
suggests that employment variation may be due to other time-varying factors than the
Impact
evaluation
523
minimum wage policy itself (for further discussion, refer to Angrist & Pischke, 2009,
chapter 5).
Figure 2(a) exemplies, from Galiani, Gertler, and Schargrodsky (2005) and Gertler et al.
(2016), how visual support for the parallel trends assumption is typically veried in
empirical work. The authors study the impact of privatizing water services on child
mortality in Argentina. Using a decade of mortality data and comparing areas with
privatized- (treatment) and non-privatized water companies (control), similar pre-reform
(pre-1995) trends are observed. In this case also the levels are almost identical, but this is not
a requirement. The authors go on to nd a statistically signicant reduction in child
mortality in areas with privatized water services. Figure 2(b) provides another example,
with data on a health variable before (and after) the 2006 Massachusetts reform, as
illustrated by Courtemanche & Zapata, 2014.
A more formal approach to provide support for the parallel trends assumption is to conduct
placebo regressions, which apply the DiD method to the pre-reform data itself. There should
then be no signicant treatment effect. When running such placebo regressions, one option is
to exclude all post-treatment observations and analyze the pre-reform periods only (if there is
enough data available). In line with this approach, Schnabl (2012), who studies the effects of the
1998 Russian nancial crisis on bank lending, uses two years of pre-crisis data for a placebo
test. An alternative is to use all data, and add to the regression specication interaction terms
between each pre-treatment period and the treatment group indicator(s). The latter method is
used by Courtemanche & Zapata (2014), studying the Massachusetts health reform. A further
robustness test of the DiD method is to add specictimetrend-termsforthetreatmentand
control groups, respectively, in expression 3B, and then check that the difference in trends is
not signicant (Wing et al., 2018, p. 459)[8].
The above discussion concerns the rawoutcome variable itself. Lechner (2011) formulates
the parallel trends assumption conditional on control variables (which should be exogenous).
One study using a conditional parallel trends assumption is the paper on mining and local
economic activity in Peru by Arag
on & Rud (2013), especially their Figure 3. Another issue,
which can be inspected in graphs such as Figure 2, is that there should be no effect from the
reform before its implementation. Finally, common supportis needed. If the treatment group
Figure 2.
Graphs used to
visually check the
parallel trends
assumption. (a) (left)
Child mortality rates,
different areas of
Buenos Aires,
Argentina, 1990-1999
(reproduced from
Galiani et al., 2005);
(b) (right) Days per
year not in good
physical health, 2001-
2009, Massachusetts
and control states
(from Courtemanche
& Zapata, 2014)
RAUSP
54,4
524
includes only high values of a control variable and the control group only low values, one is, in
fact, comparing incomparable entities. There must instead be overlap in the distribution of the
control variables between the different groups and time periods.
It should be noted that the parallel trends assumption is scale dependent, which is an
undesirable feature of the DiD method. Unless the outcome variable is constant during the
pre-reform periods, in both treatment and control, it matters if the variable is used as isor
if it is transformed (e.g. wages vs log wages). One approach to this issue is to use the data in
the form corresponding to the parameter one wants to estimate (Lechner, 2011), rather than
adapting the data to a formatthat happens to t the parallel trends assumption.
A closing remark in this section is that it is worth spending time when planning the
empirical project, before the actual analysis, carefully considering all possible data sources,
if rst-hand data needs to be collected, etc. Perhaps data limitations are such that a robust
DiD study including a parallel trend check is not feasible. On the other hand, in the
process of learning about the institutional details of the intervention studied, new data
sources may appear.
4. Further details and considerations for the use of Dierence-in-Dierences
4.1 Using control variables for a more robust identication
With a non-random assignment to treatment, there is always the concern that the treatment
states would have followed a different trend than the control states, even absent the reform. If,
however, one can control for the factors that differ between the groups and that would lead to
differences in time trends (and if these factors are exogenous), then the true effect from the
treatment can be estimated[9]. In the above regression framework (expression 3B), one should
thus control for the variables that differ between treatment and control and that would cause
time trends in outcomes to differ. With treatment assignment at the state level, this is primarily
a concern for state-level control variables (Z
st
). The main reason for including also individual-
level controls (X
ist
) is instead to decrease the variance of the regression coefcient estimates
(Angrist & Pischke, 2009, chapters 2 and 5; Wooldridge, 2012, chapters 6 and 13).
Matching is another wayto use control variables to make DiD more robust. As suggested
by the name, treatment and control group observations are matched, which should reduce
bias. First, think of a cross-sectional study with one dichotomous state-level variable that is
relevant for treatment assignment and outcomes (e.g. Democrat/Republican state). Also
assume that, even if states of one category/type are more likely to be treated, there are still
treatment and control states of both types (common support). In this case, separate
treatment effects would rst be estimated for each category. The average treatment effect is
then obtained by weighting with the number of treated states in each category. When the
number of control variables grows and/or take on many different values (or are continuous),
such exact matching is typically not possible. One alternative is to instead use the
multidimensional space of covariates Z
s
and calculate the distance between observations in
this space. Each treatment observation is matched to one or several control observations
(through e.g. Mahalanobis matching, n-nearest neighbor matching), then an averaging is
done over the treatment observations. Coarsening is another option. The multidimensional
Z
s
-space is divided into different bins, observations arematched within bins and the average
treatment effect is obtained by weighting over bins. Yet an option is the propensity score,
P(Z
s
). This one-dimensional measure represents the probability, given Z
s
, that a state
belongs to the treatment group. In practice, P(Z
s
) is the predicted probability from a logit or
probit model of the treatment indicator regressed on Z
s
. The method thus matches
observations based on the propensity score, again using n-nearest neighbor matching,
etc[10].
Impact
evaluation
525
When implementing matching in DiD studies, treatment and control observations are
matched with methods similar to the above, e.g. coarsening or propensity score. In the case
of a 2 2 study, a double difference similar to (1) is calculated, but the control group
observations are weighted according to the results of the matching procedure[11]. An
example of a DiDþmatching study of the Massachusetts reform is Sommers, Long, and
Baicker (2014). Based on county-level data, the authors use the propensity score to nd a
comparison group to Massachusetts counties.
A third approach using control variables is the synthetic control method. Similar to DiD,
it aims at balancing pre-intervention trends in the outcome variables. In the original
reference, Abadie & Gardeazabal (2003) construct a counterfactual Basque Country by
using data from other Spanish regions. Inspired by matching, the method minimizes the
(multidimensional) distance between the values of the covariates in the treatment and
control groups, by choosing different weights for the different control regions. The distance
measure also depends, however, on a weight factor for each individual covariate. This
second set of weights is chosen such that the pre-intervention trend in the control group, for
the outcome of interest, is as close as possible to the pre-intervention trend for the treatment
group. As described by Abadie & Cattaneo (2018), the synthetic control method aims at
providing a data-drivencontrol group selection (and is typically implemented in
econometrics software packages).
The Massachusetts health study of Courtemanche & Zapata (2014) illustrates a practice
for how a DiD study may go about in selecting a control group. In the main specication, the
authors use the rest of the United States as control (except a few states), and pre-reform
trends are checked (including placebo tests). The control group is thereafter restricted,
respectively, to the ten states with the most similar pre-reform health outcomes, to the ten
states with the most similar pre-reform health trends and to other New England states only.
Synthetic controls are also used. The DiD estimate is similar across specications.
Related to the discussion of control variables is the threat to identication from
compositional changes, briey mentioned in Section 3. Assume a certain state implements a
health reform. Compare with a neighboring state. If the policy induces control group
individuals with poor health to move to the treatment state, the treatment outcome will then
be composed also of these movers. In this case, the ideal is to have data on (and control for)
individuals’“migration status. In practice, such data may not be available and controls X
ist
and Z
st
are instead used. This is potentially not enough, however, as there may be changes
also in unobserved factors and/or spillovers and complementarities related to the changes in
e.g. socioeconomic variables. One practice used to lend credibility to a DiD analysis is to
search for treatment-induced compositional changes by using each covariate as a dependent
variable in an expression 2-style regression. Any signicant effect (the
b
-coefcient) would
indicate a potentially troublesome compositional change (Arag
on & Rud, 2013).
4.2 Dierence-in-Dierence-in-Dierences
Difference-in-Difference-in-Differences (DiDiD) is an extension of the DiD concept (Angrist
& Pischke, 2009), briey mentioned through an example. Long, Yemane, & Stockley (2010)
study the effects of the special provisions for young people in the Massachusetts health
reform. The authors use data on both young adults and slightly older adults. Through the
DiDiD method, they compare the change over time in health outcomes for young adults in
Massachusetts to young adults in a comparison state and to slightly older adults in
Massachusetts and construct a triple difference, to also control for other changes that occur
in the treatment state.
RAUSP
54,4
526
4.3 Standard errors[12]
In the basic OLS framework, observations are assumed to be independent and standard
errors homoscedastic. The standard errors of the regression coefcients then take a
particularly simple form. Such errors are typically corrected, however, to allow for
heteroscedasticity (Ecker-Huber-White heteroscedasticity-robust standard errors). The
second standardcorrection is to allow for clustering. Think of individual-level data from
different regions, where some regions are treated; others are not. Within a region (cluster),
the individuals are likely to share many characteristics: perhaps they go to the same schools,
work at the same rms, have access to the same media outlets, are exposed to similar
weather, etc. Factors such as these make observations within clusters correlated. In effect,
there is less variation than if the data had been independent random draws from the
population at large. Standard errors need to be corrected accordingly, typically implying
that the signicance levels of the regression coefcients are reduced[13].
For correct inference with DiD, a third adjustment needs to be done. With many time
periods, the data can exhibit serial correlation. This holds for many typical dependent
variables in DiD studies, such as health outcomes, and, in particular, the treatment variable
itself. The observations within each of the treatment and control groups can thus be
correlated over time. Failing to correct for this fact can largely overstate signicance levels,
which was the topic of the much inuential paper by Bertrand et al. (2004).
One way of handling the within-group clustering issue is to collapse the individual data
to state-level averages. Similarly, the serial correlation problem can be handled by
collapsing all pre-treatment periods to one before-period, and all post-treatment periods to
one after-period. Having checked the parallel trends assumption, one thus works with two
periods of data, at the state level (which requires many treatment and control states). A
drawback, however, is that the sample size is greatly reduced. The option to instead
continue with the individual-level data and calculate standard errors that are robust to
heteroscedasticity, within-group effects and serial correlation, are provided by many
econometric software packages.
5. Examples of Dierence-in-Dierences studies in the broader management
literature
The DiD method is increasingly applied in management studies. A growing number of
scholars use the method in areas such as innovation (Aggarwal & Hsu, 2014;Flammer &
Kacperczyk, 2016;Singh & Agrawal, 2011), board of directors composition (Berger, Kick, &
Schaeck, 2014), lean production (Distelhorst, Hainmueller, & Locke, 2016), organizational
goals management (Holm, 2018), CEO remuneration (Conyon, Hass, Peck, Sadler, & Zhang,
2019), regulatory certication (Bruno, Cornaggia, & Cornaggia, 2016), social media (Kumar,
Bezawada, Rishika, Janakiraman, & Kannan (2016), employee monitoring (Pierce, Snow, &
McAfee, 2015) and environmental policy (He & Zhang, 2018).
Different sources of exogenous variation have been used for econometric identication in
DiD papers in the management literature. A few examples are given here. Chen, Crossland,
& Huang (2014) study the effects of female board representation on mergers and
acquisitions. In a robustness test to their main analysis, further addressing the issue that
board composition may be endogenous, the authors exploit the fact that female board
representation increases exogenously if a male board director dies. A small sample of 24
such rms are identied and matched to 24 control rms, and a basic two-group two-period
DiD regression is run on this sample.
Younge, Tong, and Fleming (2014) instead use DiD as the main method and study how
constraints on employee mobility affect the acquisition likelihood. The authors use as a
Impact
evaluation
527
source of identication a 1985 change in the Michigan antitrust law that had as an effect that
employers could prohibit workers from leaving for a competitor. Ten US states, where no
changes allegedly occurred around 1985, are used as the control group. The authors also use
(coarsened exact) matching on rm characteristics to select the control group rms most
similar to the Michigan rms. In addition, graphs of pre-treatment trends are presented.
Hosken, Olson, and Smith (2018) study the effect of mergers on competition. The authors
do not have an exogenous source of variation, which is discussed at length. They compare
grocery retail prices in geographical areas where horizontal mergers have taken place
(treatment), to areas without such mergers. Several different control groups are constructed,
and a test with pre-treatment price data only is conducted, to assure there is no difference in
price trends. Synthetic controls are also used.
Another study is Flammer (2015), who investigates whether product market competition
affects investments in corporate social responsibility. Flammer (2015) uses import tariff
reductions as the source of variation in the competitive environment and compares affected
sectors (treatment) to non-affected sectors (control) over time. A matching procedure is used
to increase comparability between the groups, and a robustness check restricts the sample to
treatment sectors where the tariff reductions are likely to be de facto exogenous. The author
also uses control variables in the DiD regression, but as pointed out in the paper, these
variables have already been used in the matching procedure, and their inclusion does not
alter the results.
Lemmon & Roberts (2010) study regulatory changes in the insurance industry as an
exogenous contraction in the supply of below-investment-grade credit. Using Compustat
data, they undertake a DiD analysis complemented by propensity score matching and
explicitly analyze the parallel trends assumption. Iyer, Peydr
o, da-Rocha-Lopes, and Schoar
(2013) examine how banks react in terms of lending when facing a negative liquidity shock.
Based on Portuguese corporate loan-level data, they undertake a DiD analysis, with an
identication strategy that exploits the unexpected shock to the interbank markets in
August 2007. Other papers that have used DiD to study the effect of shocks to credit supply
are Schnabl (2012), referenced above, and Khwaja & Mian (2008).
In addition to these topics, several DiD papers published in management journals relate
to public policy and health, an area reviewed by Wing et al. (2018). The above referenced
Arag
on & Rud (2013) and Courtemanche & Zapata (2014) are two of many papers that apply
several parts of the DiD toolbox.
6. Discussion and conclusion
The paper presents an overview of the DiD method, summarized here in terms of some
practical recommendations. Researchers wishing to apply the method should carefully plan
their research design and think about what the source of (preferably exogenous) variation is,
and how it can identify causal effects. The control group should be comparable to the treatment
group and have the same data availability. Matching and other methods can rene the control
group selection. Enough time periods should be available to credibly motivate the parallel
trends assumption and, in case not fullled, it is likely that DiD is not an appropriate method.
The robustness of the analysis can be enhanced by using exogenous control variables, either
directly in the regression and/or through a matching procedure. Standard errors should be
robust and clustered in order to account for heteroscedasticity, within-group correlation and
serial correlation. Details may differ, however, including what the relevant cluster is, which
depends on the study at hand, and researchers are encouraged to delve further into this topic
(Bertrand et al., 2004;Cameron & Miller, 2015). Yet other methods, such as DiDiD and synthetic
controls were discussed, while a discussion of e.g. time-varying treatment effects and another
RAUSP
54,4
528
quasi-experimental technique, regression discontinuity, were left out. Several methodological
DiD papers were cited above, the reading of which is encouraged, perhaps together with texts
covering other non-experimental methods.
The choice of research method will vary according to many circumstances. DiD has the
potential to be a feasible design in many subelds of management studies and scholars
interested in the topic hopefully nd this text of interest. The wide range of surveys and
databases Economatica, Capital IQ and Compustat are a few examples enables the
application of DiD in distinct contexts and to different research questions. Beyond data, the
above-cited studies also demonstrate innovative ways of getting an exogenous source of
variation for a credible identication strategy.
Notes
1. The reader is assumed to have basic knowledge about regression analysis (e.g. Wooldridge, 2012)
and also about the core concepts in impact evaluation, e.g. identication strategy, causal
inference, counterfactuals, randomization and treatment eects (e.g. Gertler, Martinez, Premand,
Rawlings, & Vermeersch, 2016, chapters 3-4; White & Raitzer, 2017, chapters 3-4).
2. In this text, the terms policy, program, reform,law,regulation,intervention,shockor
treatment are used interchangeably, when referring to the object being evaluated, i.e. the
treatment.
3. Lechner (2011) provides a historical account, including Snows study of cholera in London in the 1850s.
4. The variable denominations are similar to those in Bertrand et al. (2004). An alternative way to
specify regression 2, in the 2 2 case, is to use an intercept, treatment- and after dummies and a
dummy equaling the interaction between the treatment and after dummies (e.g. Wooldridge,
2012, chapter 13). The regression results are identical.
5. Angrist & Pischke (2009), Blundell & Costa Dias (2009), Lechner (2011) and Wing et al. (2018) are
examples of references that provide additional details on the correspondence between the
potential outcomes framework, the informal/intuitive/graphical derivation of the DiD measure
and the regression specication, as well as a discussion of population vs. sample properties.
6. Note that the interpretation of
b
changes somewhat if the reform is staggered (Goodman-Bacon,
2018). An even more general case, not covered in this text, is when I
st
switches on and o.A
particular group/state can then go back and forth between being treated and untreated (e.g. Bertrand
et al., 2004). Again dierent is the case where I
st
is continuous (e.g. Arag
on & Rud, 2013).
7. Note that X
ist
and Z
st
are both vectors of variables. The X-variables could be e.g. gender, age and
income, i.e. three variables, each with individual level observations. Z
st
can be e.g. state
unemployment, variables representing racial composition, number of hospital beds, etc.,
depending on the study. The regression coecients cand dare (row) vectors.
8. See also Wing et al. (2018, pp. 460-461) for a discussion of the related concept of event studies.
Their set-up can also be used to study short- and long term reform eects. A slightly dierent
type of placebo test is to use control states only, to study if there is an eect where there should
be none (Bertrand et al., 2004).
9. In relation to this discussion, note that the Dierence-in-Dierences method estimates the
Average Treatment Eect on the Treated, not on the population (e.g. Blundell & Costa Dias,
2009;Lechner, 2011;White & Raitzer, 2017, chapter 5).
10. Matching (also referred to as selection on observables) hinges upon the Conditional
Independence Assumption (CIA) (or unconfoundedness), which says that, conditional on the
control variables, treatment and control would have the same expected outcome, in either
treatment state (treated/untreated). Hence the treatment group, if untreated, would have the same
Impact
evaluation
529
expected outcome as the control group, and the selection bias disappears (e.g. Angrist &
Pischke, 2009, chapter 3). Rosenbaum & Rubin (1983) showed that if the CIA holds for a set of
variables Z
s
, then it also holds for the propensity score P(Z
s
).
11. Such a method is used for panel data. When the data are repeated cross sections, each of the three
groups treatment-before, control-before and control-after needs to be matched to the treatment-
after observations (Blundell & Costa Dias, 2000;Smith & Todd, 2005).
12. For a general discussion, refer to Angrist & Pischke (2009) and Wooldridge (2012). Abadie,
Athey, Imbens, and Wooldridge (2017), Bertrand et al. (2004) and Cameron & Miller (2015)
provide more details.
13. When there are group eects, it is important to have a large enough number of group-period cells,
in order to apply DiD, an issue further discussed in Bertrand et al. (2004).
References
Abadie, A., & Cattaneo, M. D. (2018). Econometric methods for program evaluation. Annual Review of
Economics,10, 465503.
Abadie, A., & Gardeazabal, J. (2003). The economic costs of conict: A case study of the Basque
Country. American Economic Review,93, 113132.
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. (2017). When should you adjust standard
errors for clustering?. (No. Working Paper 24003). National Bureau of Economic Research
(NBER).
Aggarwal, V. A., & Hsu, D. H. (2014). Entrepreneurial exits and innovation. Management Science,60,
867887.
Angrist, J. D., & Krueger, A. B. (1999). Empirical strategies in labor economics. In Ashenfelter, O., &
Card, D. (Eds), Handbook of labor economics (Vol. 3, pp. 12771366). Amsterdam, The
Netherlands: Elsevier.
Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics: An empiricists companion,
Princeton, NJ: Princeton University Press.
Arag
on, F. M., & Rud, J. P. (2013). Natural resources and local communities: Evidence from a peruvian
gold mine. American Economic Journal: Economic Policy,5,125.
Ashenfelter, O. (1978). Estimating the effect of training programs on earnings. The Review of
Economics and Statistics,60,4757.
Athey, S., & Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation.
Journal of Economic Perspectives,31,332.
Berger, A. N., Kick, T., & Schaeck, K. (2014). Executive board composition and bank risk taking.
Journal of Corporate Finance,28,4865.
Bertrand, M., Duo, E., & Mullainathan, S. (2004). How much should we trust differences-in-differences
estimates? The Quarterly Journal of Economics,119, 249275.
Blundell, R., & Costa Dias, M. (2000). Evaluation methods for non-experimental data. Fiscal Studies,21,
427468.
Blundell, R., & Costa Dias, M. (2009). Alternative approaches to evaluation in empirical
microeconomics. Journal of Human Resources,44, 565640.
Bruno, V., Cornaggia, J., & Cornaggia, J. K. (2016). Does regulatory certication affect the information
content of credit ratings?. Management Science,62, 15781597.
Cameron, A. C., & Miller, D. L. (2015). A practitioners guide to cluster-robust inference. Journal of
Human Resources,50, 317372.
Card, D. (1990). Theimpact of the Mariel boatlift on the Miami labor market. ILR Review,43, 245257.
RAUSP
54,4
530
Card, D., & Krueger, A. B. (1994). Wages and employment: A case study of the fast-food industry in
New Jersey and Pennsylvania. American Economic Review,84, 772793.
Card, D., & Krueger, A. B. (2000). Minimum wages and employment: A case study of the fast-food
industry in New Jersey and Pennsylvania: reply. American Economic Review,90,13971420.
Chen, G., Crossland, C., & Huang, S. (2014). Female board representation and corporate acquisition
intensity. Strategic Management Journal,37, 303313.
Conyon, M. J., Hass, L. H., Peck, S. I., Sadler, G. V., & Zhang, Z. (2019). Do compensation
consultants drive up CEO pay? Evidence from UK public rms. British Journal of
Management,30,1029.
Courtemanche, C. J., & Zapata, D. (2014). Does universal coverage improve health? The Massachusetts
experience. Journal of Policy Analysis and Management,33,3669.
Distelhorst, G., Hainmueller, J., & Locke, R. M. (2016). Does lean improve labor standards? Management
and social performance in the Nike supply chain. Management Science,63,707728.
Duo, E., Glennerster, R., & Kremer, M. (2008). Using randomization in development economics
research: A toolkit. In P. Schultz, & J. Strauss, (Eds.), Handbook of development economics
(Vol. 4). Amsterdam, TheNetherlands and Oxford, UK: Elsevier; North-Holland, 38953962.
Flammer, C. (2015). Does product market competition foster corporate social responsibility?. Strategic
Management Journal,38, 163183.
Flammer, C., & Kacperczyk, A. (2016). The impact of stakeholder orientation on innovation: Evidence
from a natural experiment. Management Science,62, 19822001.
Galiani, S., Gertler, P., & Schargrodsky, E. (2005). Water for life: The impact of the privatization of
water services on child mortality. Journal of Political Economy,113,83120.
Gertler, P. J., Martinez, S., Premand, P., Rawlings, L. B., & Vermeersch, C. M. (2016). Impact evaluation
in practice, Washington, DC: The WorldBank.
Goodman-Bacon, A. (2018). Difference-in-Differences with variation in treatment timing. NBER
Working Paper No. 25018. NBER.
He, P., & Zhang, B. (2018). Environmental tax, polluting plantsstrategies and effectiveness: Evidence
from China. Journal of Policy Analysis and Management,37,493520.
Holm, J. M. (2018). Successful problem solvers? Managerial performance information use to
improve low organizational performance. Journal of Public Administration Research and
Theory,28,303320.
Hosken, D. S., Olson, L. M., & Smith, L. K. (2018). Do retail mergers affect competition? Evidence from
grocery retailing. Journal of Economics & Management Strategy,27,322.
Imbens, G. W., & Wooldridge, J. M. (2009). Recent developments in the econometrics of program
evaluation. Journal of Economic Literature,47,586.
Iyer, R., Peydr
o, J. L., da-Rocha-Lopes, S., & Schoar, A. (2013). Interbank liquidity crunch and the rm
credit crunch: Evidence from the 2007-2009 crisis. Review of Financial Studies,27, 347372.
Khwaja, A. I., & Mian, A. (2008). Tracing the impact of bank liquidity shocks: Evidence from an
emerging market. American Economic Review,98, 14131442.
Kumar, A., Bezawada, R., Rishika, R., Janakiraman, R., & Kannan, P. K. (2016). From social to sale: The effects
of rm-generated content in social media on customer behavior. Journal of Marketing,80,725.
Lechner, M. (2011). The estimation of causal effects by difference-in-difference methods. Foundations
and Trends® in Econometrics,4,165224.
Lemmon, M., & Roberts, M. R. (2010). The response of corporate nancing and investment to changes
in the supply of credit. Journal of Financial and Quantitative Analysis,45,555587.
Long, S. K., Yemane, A., & Stockley, K. (2010). Disentangling the effects of health reform in
Massachusetts: How important are the special provisions for young adults?. American Economic
Review,100, 297302.
Impact
evaluation
531
Pierce, L., Snow, D. C., & McAfee, A. (2015). cleaning house: The impact of information technology
monitoring on employee theft and productivity. Management Science,61, 22992319.
Rosenbaum, P. R., & Rubin, D. B. (1983). The Central role of the propensity score in observational
studies for causal effects. Biometrika,70,4155.
Schnabl, P. (2012). The international transmission of bank liquidity shocks: Evidence from an emerging
market. The Journal of Finance,67,897932.
Singh, J., & Agrawal, A. (2011). Recruiting for ideas: How rms exploit the prior inventions of new
hires. Management Science,57:, 129150.
Smith, J. A., & Todd, P. E. (2005). Does matching overcome LaLondes critique of nonexperimental
estimators? Journal of Econometrics,125, 305353.
Sommers, B. D., Long, S. K., & Baicker, K. (2014). Changes in mortality after Massachusetts health care
reform: A quasi-experimental study. Annals of Internal Medicine,160, 585594.
White, H., & Raitzer, D. A. (2017). Impact evaluation of development interventions: A practical guide,
Mandaluyong, Philippines: Asian Development Bank.
Wing, C., Simon, K., & Bello-Gomez, R. A. (2018). Designing difference in difference studies: Best
practices for public health policy research. Annual Review of Public Health,39, 453469.
Wooldridge, J. M. (2012). Introductory econometrics: a modern approach (5th ed.). Mason, OH: South-
Western College Publisher.
Younge, K. A., Tong, T. W., & Fleming, L. (2014). How anticipated employee mobility affects
acquisition likelihood: Evidence from a natural experiment. Strategic Management Journal,36,
686708.
Corresponding author
Anders Fredriksson can be contacted at: anders.fredriksson@usp.br
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
RAUSP
54,4
532
... To assess the effect of the intervention, a difference in mean scores of the participants for all five outcomes were assessed at baseline (T1) and at the end of the intervention (T2) we used a difference-in-difference (DiD) analysis [37]. The DID estimate was modelled using clustering and time-fixed effects, with the treatment occurring at the cluster and time levels [38]. ...
... Other measures of disadvantage such as housing quality and years of education completed by parents were comparable across both groups ( Table 2). Table 3 presents the results of the DiD analysis [37] for all the outcomes in girls and boys in the intervention and control groups for both crude and adjusted models. The mean scores of the resilience measure, CD-RISC, improved significantly for intervention boys (DiD (adjusted) = 5.82; 95% CI: 1.57, 9.74) compared to wait-control boys, and also among intervention girls (DiD = 4.12; 95% CI: 2.14, 6.09) compared to wait-control girls. ...
Article
Background Mental health problems are the leading cause of disease burden among young people in India. While evidence shows that youth mental health and resilience can be improved with group interventions in school settings, such an intervention has not been robustly evaluated in informal urban settings. Objective This study aimed to evaluate whether the Nae Disha 3 group intervention could improve youth resilience, mental health and gender equal attitudes among disadvantaged young people from low-income urban communities in India. Methods This cluster randomised controlled trial used an analytic sample of 476 adolescents and young adults aged 11–25 years from randomised clusters in urban Dehradun, India. The 251 intervention group participants were 112 boys and 139 girls, and the 225 young people in the wait-control group were 101 boys and 124 girls. Five validated tools measuring resilience gender equity and mental health were filled by participants at three different points in time. Results Difference in difference (DiD) analysis at T2 showed that scores improved among girls in intervention group, for adjusted model, resilience (DiD = 4.12; 95% CI: 2.14, 6.09) and among boys, for resilience (DiD = 5.82; 95% CI: 1.57, 9.74). Conclusions The Nae Disha 3 intervention among disadvantaged urban youth moderately improved resilience for both young men and women, though it did not significantly impact mental health, self-efficacy, or gender-equal attitudes. We establish potential merit for this approach to youth mental health but recommend further research to examine active ingredients and the ideal duration of such group interventions.
... The treatment group of the DiD regression consists of listed companies in Germany and Austria that experience the implementation of SRD II; the control group consists of listed companies in Switzerland, which do not experience the implementation of SRD II. Switzerland constitutes a suitable control group country because it is comparable to Germany and Austria, but it is not a member of EU and EEA and, thus, is not under effect of SRD II (Fredriksson and Oliveira 2019). In addition, Switzerland does not experience changes in compensation-related regulation in 2019. ...
Article
Full-text available
Over the last decade, CEO pay has increased drastically. The Shareholder Rights Directive II (“SRD II”) enacted by the EU is considered one solution to reduce potentially excessive executive pay but its effectiveness is unclear. To this end, this study investigates the impact of SRD II on the level and structure of CEO compensation in German and Austrian firms, compared to Swiss firms that did not experience a change in compensation-related regulation. Findings reveal that SRD II is not effective in reducing executive pay levels but promotes the use of deferred pay.
... Важное условие применения метода -предположение о том, что между группой воздействия и контрольной не существует различий во времени и если бы объекты тритмент-группы не подверглись воздействию, то динамика их результатов была бы аналогичной объектам контрольной. Это условие позволяет утверждать, что изменение результата у объектов тритментгруппы связано именно с эффектом политики, поскольку все остальные факторы внешней среды влияют на объекты обеих групп одинаково (см.: Bertrand et al., 2004;Gertler et al., 2016;Fredriksson, de Oliveira, 2019). ...
Article
In this article, we analyze the policy of direct subsidizing of academia— industry cooperation projects in Russia. Using the difference in differences method and companies’ microdata, we assess the policy impact on the change in the revenue growth rates of 133 subsidy recipient companies in 2010—2022. It is established that subsidies have the most noticeable impact on small and mediumsized enterprises (SMEs) and companies from high-tech industries. Additionally, using logit regression based on surveys in 2017 and 2022, we determine that research organizations which have used this measure are characterized by the presence of young researchers, access to foreign scientific and technical information databases, and experience in academia—industry cooperation. At the same time, organizations are not interested in this subsidy if they already used other financial instruments (for example, grants from research foundations), had orders from state corporations, and a high level of international scientific interactions. Based on the results of the study, recommendations have been developed to improve public policy by differentiating mechanisms to support academia—industry cooperation for large companies and SMEs, concentrating resources on high-tech industries and strengthening universities’ access to young talent and global knowledge databases.
... For the primary outcomes, a twoway difference-in-difference (DID) approach within a generalised linear model framework, with binomial distribution and log link, was used to estimate the DID estimator, reported as both the OR and the risk difference (RD) (ie, difference in average outcome in the treatment group before and after treatment minus the difference in average outcome in the control group before and after treatment). 33 The 'svyglm' approach was used to account for complex sampling design, along with a fixed catchment effect and cluster robust SEs. Imbalances in potential confounding variables across arms and time points were addressed by evaluating the influence of individual (age, gender, occupation type, nationality, travel history) and site-level (rainfall and altitude) factors associated with the outcome. ...
Article
Full-text available
Background Agricultural worksites are rarely targeted by malaria control programmes, yet may play a role in maintaining local transmission due to workers’ high mobility, low intervention coverage and occupational exposures. Methods A quasi-experimental controlled intervention study was carried out in farming and cattle herding populations in northern Namibia to evaluate the impact of a targeted malaria intervention package. Eight health facility catchment areas in Zambezi and Ohangwena Regions were randomised to an intervention arm and eligible individuals within worksites in intervention areas received targeted drug administration with artemether-lumefantrine, mop-up indoor residual spraying and long-lasting insecticidal nets, combined with distribution of topical repellent in Zambezi Region. Impact on malaria outcomes and intervention coverage was evaluated over a single transmission season using pre-intervention and post-intervention cross-sectional surveys in a random subset of worksites and community incidence from passively detected cases. Entomological collections and residual efficacy assays on canvas and tarpaulin were conducted. Results Delivery of a single intervention round was associated with a reduction in the prevalence of malaria (OR 0.24, 95% CI 0.1 to 0.5; risk difference (RD) −6.0%, 95% CI −9.4 to –2.8). Coverage of at least one intervention increased (RD 51.6%, 95% CI 44.4 to 58.2) among the target population in intervention compared with control areas. This effect was largely driven by results in Zambezi Region, which also observed a decline in community incidence (−1.29 cases/1000 person-weeks, 95% CI −2.2 to –0.3). Residual efficacy of pirimiphos-methyl (Actellic) on tarpaulin and canvas was high at 24hours but declined to 44.6% at 4 months. Conclusion The study shows that targeted delivery of malaria interventions to cattle herders and agricultural workers at worksites has potential to impact local transmission. Findings highlight the need for further research on the role of key populations in Plasmodium falciparum transmission in Namibia. Trial registration number NCT04094727.
Article
This brief report provides preliminary independent evidence of the efficacy of Football Beyond Borders (FBB), a targeted, school-based social and emotional learning (SEL) intervention for at-risk youth. FBB includes weekly SEL classroom sessions, activities on the football pitch, 1:1 therapy sessions, holiday support, and rewards trips. Propensity score matching and difference-in-differences estimation were used in a pre-test/post-test control group design to assess the impact of FBB on the mental wellbeing (assessed via the Short Warwick–Edinburgh Mental Wellbeing Scale, SWEMWBS) of participants designated at-risk ( N = 46 aged 12–14, 78.3% male), passive learners ( N = 72, aged 12–14, 84.7% male), and role models ( N = 35, aged 12–14, 85.7% male), with matched control samples derived from a subset of the #BeeWell cohort study ( N = 8015). A significant intervention effect was observed for at-risk youth, with FBB leading to an increase of approximately 2.4 SWEMWBS points ( d = 0.44). No significant intervention effects were observed for passive learners or role models. These results indicate that FBB can improve the mental wellbeing of at-risk youth. Accordingly, an explanatory trial is warranted.
Article
Zambia runs an agricultural input support program for 900,000 rural households, primarily targeting maize, the staple crop. A new delivery mode was introduced to the program, initially allowing farmers in 16 of the 115 districts to choose inputs using electronic vouchers, with the aim of encouraging crop diversification, amongst other objectives. Despite the potential benefits of this reform from a theoretical perspective, farmers may not always be able to diversify their crops due to existing barriers. In this paper, we examine how the electronic voucher reform impacted crop diversification and rotation practices at the household level during the pilot phase. The paper combines data from surveys conducted over two waves with 1518 rural households, high‐resolution satellite rainfall data and in‐depth qualitative interviews with 23 key informants. We find evidence that the reform had a positive impact (an increase of 0.231 points on the Simpson index of diversification) on crop diversification. However, there is no significant direct impact on crop rotation. We nevertheless observed that crop rotation can gain impetus only if farmers fully embrace crop diversification. Results from the qualitative interviews suggest that the limited effectiveness of electronic vouchers could be due to inadequacies in private sector input and output markets, as well as cultural preferences. Several important policy implications arise from these findings, including the need to promote markets for alternative crops and enhance extension services.
Article
Agricultural extension programs must demonstrate their value to compete for limited government funding. As extension professionals measure the value of their programs, the risk exists that the information they report will provide a biased or an inaccurate measure of value. We examine the evaluation process for extension programs and extension personnel to identify potential sources of bias or inaccuracies. We find that bias and inaccuracy in program evaluation often stems from a focus on short‐term outcomes, rather than long‐term impacts, while bias in personnel evaluation can result from information asymmetries that exist between extension personnel and their evaluators.
Article
Importance In 2024, the US Preventive Services Task Force (USPSTF) reversed a 2009 policy recommending only females aged 50 to 74 years complete a biennial mammogram. Understanding whether females facing heterogeneous breast cancer risks responded to the 2009 guidance may illuminate how they may respond to the latest policy update. Objective To evaluate whether the 2009 policy was associated with changes in mammography screening in females no longer recommended to complete a biennial mammogram and whether these changes varied by factors associated with breast cancer risk. Design, Setting, and Participants The difference-in-differences design compared biennial mammogram trends in the exposed groups (aged 40-49 and ≥75 years) with trends of the unexposed groups (aged 50-64 and 65-74 years), before and after the 2009 update. Population-based, repeated cross-sectional survey data came from the Behavioral Risk Factor Surveillance System (BRFSS) biennial cancer screening module (2000-2018). The sample was restricted to females between ages 40 and 84 years. Data were analyzed from March 1 to June 30, 2024. Main Outcomes and Measures The outcome was a binary variable indicating whether the respondent reported a mammogram in the past 2 years (biennial). After 2009, females aged 40 to 49 and 75 or older years were exposed to the policy update, as a complete biennial mammogram was recommended. Subgroup analyses included race and ethnicity, educational level, household income, smoking history, current binge drinking status, and state of residence. Results The sample included 1 594 834 females; 75% reported a biennial mammogram. In those aged 40 to 49 years, the USPSTF update was associated with a 1.1 percentage-point (95% CI, −1.8% to −0.3 percentage points) decrease in the probability of a biennial mammogram, with the largest decreases in the non-Hispanic Black population (−3.0 percentage points; 95% CI, −5.5% to −0.5 percentage points). In the aged 75 years or older group, the USPSTF update was associated with a 4.8 percentage-point decrease (95% CI, −6.3% to −3.5 percentage points) in the probability of a biennial mammogram, with significant heterogeneity by race and ethnicity, binge drinking status, and state residence. Conclusions and Relevance In this study, socioeconomic factors were associated with differences in how females responded to the 2009 USPSTF mammography recommendation. Whether the 2024 update considered such differences is unclear. These findings suggest that including risk assessment into future USPSTF policy updates may improve adoption of risk-reducing interventions and shorten the time to diagnosis and treatment for high-risk patients.
Article
Full-text available
Initiatives to build research universities to have world-class universities, the creator and the disseminator of scientific knowledge in knowledge-based economies, are among the most important policy reactions in higher education systems. With an increase in demands on greater accountability, transparency, and efficiency, studies investigating the excellence initiatives of different countries are growing in number. However, the literature on the Research University Project of Turkey is still in the infancy stage. We, therefore, conducted a quasi-experimental study to evaluate the effect of research university project using six-year panel data between 2015 and 2020. Results from the difference-indifferences analysis showed that the research university initiative of Turkey was not successful in differentiating research universities from non-research universities even if the research universities have not lost the quantitative superiority over non-research universities. However, the result also revealed a possible spillover effect of the initiative on non-research universities by the policy intervention that may have created institutional competition and an isomorphic science production pattern to become a research university. ARTICLE HISTORY
Book
Full-text available
This book offers guidance on the principles, methods, and practice of impact evaluation. It contains material for a range of audiences, from those who may use or manage impact evaluations to applied researchers. Impact evaluation is an empirical approach to estimating the causal effects of interventions, in terms of both magnitude and statistical significance. Expanded use of impact evaluation techniques is critical to rigorously derive knowledge from development operations and for development investments and policies to become more evidence-based and effective. To help backstop more use of impact evaluation approaches, this book introduces core concepts, methods, and considerations for planning, designing, managing, and implementing impact evaluation, supplemented by examples. The topics covered range from impact evaluation purposes to basic principles, specific methodologies, and guidance on field implementation. It has materials for a range of audiences, from those who are interested in understanding evidence on “what works” in development, to those who will contribute to expanding the evidence base as applied researchers.
Article
Full-text available
The difference in difference (DID) design is a quasi-experimental research design that researchers often use to study causal relationships in public health settings where randomized controlled trials (RCTs) are infeasible or unethical. However, causal inference poses many challenges in DID designs. In this article, we review key features of DID designs with an emphasis on public health policy research. Contemporary researchers should take an active approach to the design of DID studies, seeking to construct comparison groups, sensitivity analyses, and robustness checks that help validate the method's assumptions. We explain the key assumptions of the design and discuss analytic tactics, supplementary analysis, and approaches to statistical inference that are often important in applied research. The DID design is not a perfect substitute for randomized experiments, but it often represents a feasible way to learn about casual relationships. We conclude by noting that combining elements from multiple quasi-experimental techniques may be important in the next wave of innovations to the DID approach. Expected final online publication date for the Annual Review of Public Health Volume 39 is April 1, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Do compensation consultants drive up CEO pay for the benefit of managers, or do they design pay packages to benefit firm owners? Using a large sample of UK firms from the FTSE All‐Share Index over the 2003–2011 period, we show a positive correlation between the presence of compensation consultants and CEO pay. Importantly, isolating this effect is somewhat dependent on the endogenous selection of consultants and the statistical modelling strategy deployed. We find evidence that compensation consultants improve CEO compensation design when their expertise is of greater importance (e.g. during the post‐financial crisis period, or for firms that have particularly weak compensation policies). In addition, our findings show that compensation consultants increase CEO pay–performance sensitivity. The balance of evidence supports optimal contracting theory more than managerial power theory, but the authors caution the limits to this verification. We are careful to note that the more compelling evidence for the positive effect of pay consultants on CEOs is based on advanced methods (such as propensity score matching and difference‐in‐differences), and that more standard approaches (such as OLS and fixed effects) are unlikely to reveal the same level of causality of consultants on CEO pay.
Article
Performance management is increasingly the norm for public organizations. Although there is an emergent literature on performance information use, we still know little on how managers engage in functional performance management practices. At the same time, growing evidence suggests that managers face pressure to improve low performance because of a negativity bias in the political environment. However, in managerial performance information use, the negativity bias might be reconsidered as a prioritization heuristic with positive performance attributes, directing attention to organizational goals with a favorable return of investment. I test this argument with data from public schools. A fixed-effect estimation is used to analyze how principals prioritize when they are provided with performance information on a number of different educational goals. Furthermore, a difference-in-differences model tests whether the prioritizations of certain goals have performance-enhancing effects over time. The analysis shows that principals prioritize goals with low performance and that prioritizations result in performance increase. The improvements primarily occur for goals that have a low performance level and that are repeatedly prioritized. © The Author(s) 2018. Published by Oxford University Press on behalf of the Public Management Research Association. All rights reserved.
Article
Program evaluation methods are widely applied in economics to assess the effects of policy interventions and other treatments of interest. In this article, we describe the main methodological frameworks of the econometrics of program evaluation. In the process, we delineate some of the directions along which this literature is expanding, discuss recent developments, and highlight specific areas where new research may be particularly fruitful.
Article
Although environmental taxes have become a popular policy tool, their effectiveness for pollution control and impact on the compliance strategies of agents remains questionable. This research uses a quasi-experimental design to examine the effectiveness of the Pay for Permit policy, an environmental tax that has been imposed on water pollution emissions in Lake Tai Basin, Jiangsu, China, since 2009. A plant-level panel dataset from 2007 to 2010 is used for both difference-in-differences and difference-in-difference-in-differences analyses to compare the pollution discharge, pollution abatement, and pollution generation of policy participants and control groups. The results indicate that treated plants reduce their emissions by about 40 percent after two years of the policy implementation. Thus, the policy generated approximately a 7 percent decrease in the industrial chemical oxygen demand emission in the entire Lake Tai Basin based on the emission level of 2007. Pollution is primarily reduced via end-of-pipe abatement instead of cleaner production. Our results show the effectiveness of environmental taxes in controlling industrial pollution, and indicate that the tax may not motivate the adoption of innovative techniques in the short term.
Article
In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. It also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, while in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter.
Article
This study estimates the price effects of horizontal mergers in the U.S. grocery retailing industry. We examine fourteen regions affected by mergers, including mergers in highly concentrated and relatively unconcentrated markets. We identify price effects by comparing markets affected by mergers to unaffected markets using difference-in-difference estimation with three different comparison groups, propensity score weights, and by using the synthetic control method. Our results are robust to the choice of control group and estimation technique. We find that mergers in highly concentrated markets are most frequently associated with price increases, and mergers in less concentrated markets are most often associated with price decreases.