ArticlePDF Available

Abstract and Figures

This paper assesses modelling choices available to researchers using multilevel (including longitudinal) data. We present key features, capabilities, and limitations of fixed (FE) and random (RE) effects models, including the within-between RE model, sometimes misleadingly labelled a ‘hybrid’ model. We show the latter is unambiguously a RE model, and the most general of the three models, encompassing strengths of both FE and RE. As such, and because it allows for important extensions—notably random slopes—we argue it should be the starting point of all multilevel analyses. Simulations reveal the extent to which these models cope with mis-specification, showing (1) failing to include random slopes can lead to anti-conservative standard errors, and (2) mis-specifying non-Normal random effects as Normally distributed can introduce small biases to variance/random-effect estimates, but not fixed-part estimates. We conclude with advice for applied researchers, supporting good methodological decision-making in multilevel/longitudinal data analysis. See also earlier publication: https://www.researchgate.net/publication/233756428_Explaining_Fixed_Effects_Random_Effects_Modeling_of_Time-Series_Cross-Sectional_and_Panel_Data
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Quality & Quantity (2019) 53:1051–1074
https://doi.org/10.1007/s11135-018-0802-x
1 3
Fixed andrandom eects models: making aninformed
choice
AndrewBell1 · MalcolmFairbrother2· KelvynJones3
Published online: 7 August 2018
© The Author(s) 2018
Abstract
This paper assesses the options available to researchers analysing multilevel (including lon-
gitudinal) data, with the aim of supporting good methodological decision-making. Given
the confusion in the literature about the key properties of fixed and random effects (FE
and RE) models, we present these models’ capabilities and limitations. We also discuss
the within-between RE model, sometimes misleadingly labelled a ‘hybrid’ model, showing
that it is the most general of the three, with all the strengths of the other two. As such, and
because it allows for important extensions—notably random slopes—we argue it should
be used (as a starting point at least) in all multilevel analyses. We develop the argument
through simulations, evaluating how these models cope with some likely mis-specifica-
tions. These simulations reveal that (1) failing to include random slopes can generate anti-
conservative standard errors, and (2) assuming random intercepts are Normally distributed,
when they are not, introduces only modest biases. These results strengthen the case for the
use of, and need for, these models.
Keywords Multilevel models· Fixed effects· Random effects· Mundlak· Hybrid models·
Within and between effects
Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s1113
5-018-0802-x) contains supplementary material, which is available to authorized users.
* Andrew Bell
andrew.j.d.bell@sheffield.ac.uk
Malcolm Fairbrother
Malcolm.fairbrother@umu.se
Kelvyn Jones
kelvyn.jones@bristol.ac.uk
1 Sheffield Methods Institute, University ofSheffield, ICOSS, 219 Portobello, SheffieldS14DP, UK
2 Sociology Department, Umeå University, Hus Y, Beteendevarhuset, Mediagränd 14,
Beteendevetarhuset, Umeå universitet, 90187Umeå, Sweden
3 School ofGeographical Sciences andCentre forMultilevel Modelling, University ofBristol,
University Road, BristolBS81SS, UK
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1052
A.Bell et al.
1 3
1 Introduction
Analyses of data with multiple levels, including longitudinal data, can employ a variety
of different methods. However, in our view there is significant confusion regarding these
methods. This paper therefore presents and clarifies the differences between two key
approaches: fixed effects (FE) and random effects (RE) models. We argue that in most
research scenarios, a well-specified RE model provides everything that FE provides and
more, making it the superior method for most practitioners (see also Shor et al. 2007;
Western 1998). However, this view is at odds with the common suggestion that FE is often
preferable (e.g. Vaisey and Miles 2017), if not the “gold standard” (e.g. Schurer and Yong
2012). We thus address widespread misunderstandings about FE and RE models, such as
those from the literature’s use of confusing terminology (including the phrase ‘random
effects’ itself—see for example Gelman 2005) and/or different disciplines’ contradictory
approaches to the same important methodological questions.
In addition to this synthesis of the inter-disciplinary methodological literature on FE and
RE models (information that, whilst often misunderstood, is not new), we present an origi-
nal simulation study showing how various forms of these models respond in the presence
of some plausible model mis-specifications. The simulations show that estimated standard
errors are anti-conservative when random-slope variation exists but a model does not allow
for it. They also show the robustness of estimation results to mis-specification of random
effects as Normally distributed, when they are not; substantial biases are confined to vari-
ance and random effect estimates in models with a non-continuous response variable.
The paper begins by outlining what both FE and RE aim to account for: clustering
or dependence in a dataset, and differing relationships within and between clusters. We
then present our favoured model: a RE model that allows for distinct within and between
effects,1 which we abbreviate “REWB”, with heterogeneity modelled at both the cluster
(level 2) and observation (level 1) level. Focussing first on the fixed part of the model, we
show how the more commonly used FE, RE and pooled OLS models can be understood
as constrained and more limited versions of this model; indeed, REWB is our favoured
model because of its encompassing nature. Section3 of this paper focuses on the differ-
ent treatment of level-2 entities in FE and RE models, and some of the advantages of the
RE approach. In Sect.4, we consider some important extensions to the REWB model that
cannot be as effectively implemented under a FE or Ordinary Least Squares framework:
‘random slopes’ allowing the associations between variables to vary across higher-level
entities, further spatial and temporal levels of analysis, and explicit modelling of complex
level 1 heteroscedasticity. We show that implementing these extensions can often be of
paramount importance and can make results more nuanced, accurate, and informative. Sec-
tion5 then considers models with a non-continuous response variable, and some of the dis-
tinct challenges that such data present, before considering the assumptions made by the RE
model and the extent to which it matters when those assumptions are violated. The article
concludes with some practical advice for researchers deciding what model they should use
and how.
1 The use of the term ‘effect’ in the phrase ‘within effect’, ‘between effect’ and ‘contextual effect’ should
not imply that these should necessarily be interpreted as causal. This caution applies to the phrases ‘random
effects’ and ‘fixed effects’ as well.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1053
Fixed andrandom effects models: making aninformed choice
1 3
2 Within, betweenandcontextual eects: conceptualising thexed
part ofthemodel
Social science datasets often have complex structures, and these structures can be highly
relevant to the research question at hand, and not merely a convenience in the research
design that has become a nuisance in the analysis. Often, observations (at level 1) are clus-
tered into groups of some kind (at level 2). Such two-level data structures are the main
focus of this paper, though data are sometimes grouped at further levels, yielding three (or
more) levels. Some of the most common multilevel structures are outlined in Table1. In
broad terms, these can be categorised into two types: cross-sectional data, where individu-
als are nested within a geographical or social context (e.g. individuals at level 1, within
schools or countries at level 2), and longitudinal data, where individuals or social units are
measured on a number of occasions. In the latter context, this means occasions (at level
1) are nested within the individual or entity (now at level 2). In all cases these structures
represent real and interesting societal configurations; they are not simply a technicality or
consequence of survey methodology, as the population may itself be structured by social
processes, distinctions, and inequalities.
Structures are important in part because variables can be related at more than one level
in a hierarchy, and the relationships at different levels are not necessarily equivalent. Cross-
sectionally, for example, some social attitude (Y) may be related to an individual’s income
X (at level 1) very differently than to the average income in their neighbourhood, country,
or region (level 2). A classic example of this comes from American politics. American
states with higher incomes therefore tend to elect more Democratic than Republican politi-
cians, but within states richer voters tend to support Republican rather than Democratic
candidates (Gelman 2008).
Longitudinally, people might be affected by earning what is, for them, an unusually high
annual income (level 1) in a different way than they are affected by being high-earners
generally across all years (level 2). The same can hold for whole societies: Europeans for
example demand more income redistribution from their governments in times of greater
inequality—relative to the average for their country—even though people in consistently
more unequal countries do not generally demand more redistribution (Schmidt-Catran
2016). Thus, we can have “within” effects that occur at level 1, and “between” or “contex-
tual” effects that occur at level 2 (Howard 2015), and these three different effects should
not be assumed to be the same.
Sometimes it is the case that within effects are of the greatest interest, especially when
policy interventions are evaluated. With panel data, for example, within effects can cap-
ture the effect of an independent variable changing over time. Many studies have argued
for focusing on the longitudinal relationships because unobserved, time-invariant differ-
ences between the level 2 entities are then controlled for (Allison 1994; Halaby 2004, see
Sect.2.3). Christmann (2018) for example shows that people are more satisfied with the
functioning of democracy in their country during times of good economic performance—a
within-country effect that shows the value of improving economic performance.
Yet between effects in longitudinal studies are often equally illuminating, despite being
by definition non-changing—as evidenced by the many published studies that rely exclu-
sively on cross-sectional data. Similarly, in cross-sectional studies, the effects of wider
social contexts on individuals can also be extremely relevant. Social science is concerned
with understanding the world as it exists, not just dynamic changes within it. Thus with a
panel dataset for example, it will often be worth modelling associations at the higher level,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1054
A.Bell et al.
1 3
Table 1 Some hierarchical structures of data common in social science
For more elaboration of hierarchical and non-hierarchical structures, see Rasbash (2008)
Broad category Data type Level 1 Level 2 Level 3
Cross-sectional Clustered survey data (Maimon and Kuhl 2008) Individuals Neighbourhoods
Cross-sectional Cross-national survey data (Ruiter and van Tubergen 2009) Individuals Countries
Cross-sectional Surveys with multiple items (Deeming and Jones 2015; Sampson
etal. 1997)
Items Individuals
Panel Country time-series cross-sectional data (Beck and Katz 1995;
Western 1998)
Occasions Countries
Panel Individual panel data (Lauen and Gaddis 2013) Occasions Individuals
Panel at level 1, cross-sectional at level 2 Panel data on individuals who are clustered (Kloosterman etal.
2010)
Occasions Individuals Schools
Cross-sectional at level 1, Panel at level 2 Comparative longitudinal survey data (Fairbrother 2014; Schmidt-
Catran and Spies 2016), or repeated cross-sectional data (Duncan
etal. 1996)
Individuals Country-years/region-years Countries/regions
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1055
Fixed andrandom effects models: making aninformed choice
1 3
in order to understand the ways in which individuals differ—not just the ways in which
they change over time (see, for example, Subramanian etal. 2009). We take it as axiomatic
that we need both micro and macro associations to understand the whole of ‘what is going
on’.
2.1 The most general: within‑between REandMundlak models
We now outline some statistical models that aim to represent these processes. Taking a
panel data example, where individuals i (level 2) are measured on multiple occasions t
(level 1), we can conceive of the following model—the most general of the models that
we consider in this paper. This specification is able to model both within- and between-
individual effects concurrently, and also explicitly models heterogeneity in the effect of
predictor variables at the individual level:
Here
yit
is the dependent variable,
xit
is a time-varying (level 1) independent variable,
and
zi
is a time-invariant (level 2) independent variable. The variable
xit
is divided into
two with each part having a separate effect:
𝛽1W
represents the average within effect of
xit
, whilst
𝛽2B
represents the average between effect of
xit
.2 The
𝛽3
parameter represents
the effect of time-invariant variable
zi
, and is therefore in itself a between effect (level 2
variables cannot have within effects since there is no variation within higher-level entities.)
Further variables could be added as required.
The random part of the model includes two terms at level 2—a random effect (
𝜐i0
)
attached to the intercept and a random effect (
) attached to the within slope—that
between them allow heterogeneity in the within-effect of
xit
across individuals. Each of
these are usually assumed to be Normally distributed (as discussed later in this paper).
We will demonstrate in Sect.4 that specifying heterogeneity at level 2 (with the
term
in Eq.1) can be important for avoiding biases, in particular in standard errors, and this is a
key problem with FE and ‘standard’ RE models. However, to clarify the initial arguments
of the first part of this paper, we consider a simplified version of this model that assumes
homogeneous effects across level 2 entities:
Here
𝜐i
are the model’s (homogeneous) random effects for individuals i, which are
assumed to be Normally distributed. The
𝜖it
are the model’s (homoscedastic) level 1 residu-
als, which are also assumed to be Normally distributed (we will discuss models for non-
Gaussian outcomes, with different distributional assumptions, later).
An alternative parameterisation to Eq.2 (with the same distributional assumptions) is
the ‘Mundlak’ formulation (Mundlak 1978):
(1)
yit =𝜇+𝛽1W(xit ̄xi)+𝛽2B̄xi+𝛽3zi+𝜐i0+𝜐i1(xit ̄xi)+𝜖it0
(2)
yit
=𝛽0+𝛽1
W
(x
it
̄x
i
)+𝛽2
B
̄x
i
+𝛽3z
i
+
(
𝜐
i
+𝜖
it).
(3)
yit =𝛽0+𝛽1Wxit +𝛽2C̄xi+𝛽4zi+(𝜐i+𝜖it).
2 Note that the variable
̄xi
associated with
𝛽2
could be calculated using only observations for which there is
a full data record, though if more data exists this could be included in the calculation of
̄xi
, to improve the
estimate of
𝛽2
. Alternatively, calculating
(xit ̄xi)
with only observations included in the model ensures
𝛽1
is estimated using only within-unit variation. In practice, the difference between these modelling choices is
usually negligible.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1056
A.Bell et al.
1 3
Here
xit
is included in its raw form rather than de-meaned form
xit
̄xi
. Instead of the
between effect
𝛽2B
, the Mundlak model estimates the “contextual effect”
𝛽2C
. The key dif-
ference between these two, as spelled out both graphically and algebraically by Rauden-
bush and Bryk (2002:140) is that the raw value of the time–varying predictor (
xit
) is con-
trolled for in the estimate of the contextual effect in Eq.3, but not in the estimate of the
between effect in Eq.2. Thus if the research question at hand is “what is the effect of a
(level 1) individual moving from one level-2 entity to another”, the contextual effect (
𝛽2C
)
is of more interest, since it holds the level 1 individual characteristics constant. In contrast,
if we simply want to know “what is the effect of changing the level of
̄xi
, without keeping
the level of
xit
constant?”, the between effect (
𝛽2B
) will provide an answer to that. With
longitudinal data, the contextual effect is fairly meaningless: it doesn’t make sense for an
observation (level 1) to move from one (level 2) individual to another, because they are by
definition belonging to a specific individual. It therefore makes little sense to control for
those observations in estimating the level 2 effect. As such, the between effect, and thus the
REWB model, is generally more informative. When using cross-sectional data, the contex-
tual effect is of interest (since we can imagine level 1 individuals moving between level 2
entities without altering their own characteristics). It can thus measure the additional effect
of the level 2 entity, once the individual-level characteristic has been accounted for. The
between effect can also be interpreted, but a significant effect could be produced as a result
of the composition of level 1 entities, without a country-level construct driving the effect.
Note, however, that these models are equivalent, since
𝛽1W+𝛽2C=𝛽2B
; each model con-
veys the same information and will fit the data equally well and we can obtain one from the
other with some simple arithmetic.3
In a rare recent example using cross-sectional international survey data, Fairbrother
(2016) studied public attitudes towards environmental protection, allowing for separate but
simultaneous tests both among and within countries of the associations between key atti-
tudinal variables. This permitted the identification of political trust as an especially critical
correlate of greater support for environmental protection at both the individual and national
level—an important discovery in the substantive literature.
Both the Mundlak model and the within-between random effects (REWB) models
(Eqs. 2 and 3 respectively) are easy to fit in all major software packages (e.g. R, Stata,
SAS, as well as more specialist software like HLM and MLwiN). They are simply random
effects models with the mean of
xit
included as an additional explanatory variable (Howard
2015).
2.2 Constraining thewithin‑between REmodel: xed eects, random eects
andOLS
Having established our ‘encompassing’ model in its two alternative forms (Mundlak, and
within-between), we now present three models that are currently more often used. Show-
ing how each of these is a constrained version of Eqs.2 or 3 above, we demonstrate the
disadvantages of choosing any of them instead of the more general and informative REWB
specification.
3 One potential advantage of the within-between model over the Mundlak specification is that there will
be zero correlation between
̄xi
and
(xit ̄xi)
, which can facilitate model convergence. Furthermore, if there
is problematic collinearity between multiple
̄xi
’s, some or all of these can be omitted without affecting the
estimates of
𝛽1
.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1057
Fixed andrandom effects models: making aninformed choice
1 3
2.2.1 Random eects withoutwithinandbetween separation
One commonly used model uses the random effects framework, but does not estimate sepa-
rate relationships at each of the two levels:
This approach effectively assumes that
𝛽1W=𝛽2B
, or equivalently that
𝛽2C=0
, in Eqs.2
and 3 (Bell etal. 2018). Where this assumption is valid, this model is a good choice, and
has benefits over the more general model. Specifically, the estimate of
𝛽RE
1
will be more
efficient than the estimates of
𝛽1
or
𝛽2B
in Eq. 2, because it can utilise variation at both
the higher and lower level (e.g. Fairbrother 2014; Halaby 2004). However, when
𝛽1𝛽2B
,
the model will produce a weighted average of the two,4 which will have little substan-
tive meaning (Raudenbush and Bryk 2002:138). Fortunately, it is easy to test whether the
assumption of equal within and between effects is true, by testing the equality of the coef-
ficients in the REWB model), or the significance of the contextual effect in the Mundlak
model (for example via a Wald test). If there is a significant difference (and not just that the
between effect is significant different from zero) the terms should not be combined, and the
encompassing within-between or Mundlak model should be used. This was done by Han-
chane and Mostafa (2012) considering bias with this model for school (level 2) and student
(level 1) performance. They found that in less selective school systems (Finland), there was
little bias and a model like Eq.4 was appropriate, whilst in more selective systems (UK
and Germany) the more encompassing model of Eq.3 was necessary to take account of
schools’ contexts and estimate student effects accurately.
This is, in fact, what is effectively done by the oft-used ‘Hausman test’ (Hausman
1978). Although often (mis)used as a test of whether FE or RE models “should” be used
(see Fielding 2004), it is really a test of whether there is a contextual effect, or whether
the between and within effects are different. This equates in the panel case to whether
the changing within effect (e.g. for an effect of income: the effect of being unusually well
paid, such as after receiving a non-regular bonus or a pay rise) is different from the cross-
sectional effect (being well paid on average, over the course of the period of observa-
tion). Even when within and between effects are slightly different, it may be that the bias
in the estimated effect is a price worth paying for the gains in efficiency, depending on
the research question at hand (Clark and Linzer 2015). Either way, it is important to test
whether the multilevel model in its commonly applied form of Eq.4 is an uninterpretable
blend of two different processes.
2.2.2 Fixed eects model
Depending on the field, perhaps the most commonly used and recommended method
of dealing with differing within and between effects as outlined above is ‘fixed effects
(4)
y
it
=𝛽
0
+𝛽
RE
1
x
it
+𝛽
RE
3
z
i
+(𝜐
i
+𝜖
it)
4 Specifically, the estimate will be weighted as:
𝛽
ML =
w
W
𝛽
W
+w
B
𝛽
B
w
W+
w
B
, where
wW
is precision of the within esti-
mate, that is 
w
W=
1
SE
𝛽
W
2
and
wB
is precision of the between estimate,
w
B=
1
SE
𝛽
B
2
. Given
the larger sample size (and therefore higher precision) of the within estimate, the model will often tend
towards the within estimate.
𝛽W
and
𝛽B
are the within and between effects, respectively (estimated as
𝛽1
and
𝛽2B
in Eq.2), although this would depend on the extent of the unexplained level 1 and 2 variation in the
model.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1058
A.Bell et al.
1 3
modelling. This approach is equivalent to that represented in Eqs.2 and 3, except that
uj
are specified as fixed effects: i.e. dummy variables are included for each higher-level entity
(less a reference category) and the
𝜐i
are not treated as draws from any kind of distribution.
The result is that between effects (associations at the higher level) cannot be estimated, and
the model can be reduced to:
Or reduced even further to:
This is the model that most software packages actually estimate, such that they do not
estimate the magnitudes of the fixed effects themselves. Thus, the model provides an esti-
mate of the within effect
𝛽1
, which is not biased by between effects that are different from
them.5 This is of course what is achieved by the REWB model and the Mundlak model: the
REWB model employs precisely the same mean-centring as FE models. However, unlike
the REWB and Mundlak specification, the de-meaned FE specification reveals almost
nothing about the level-2 entities in the model. This means that many research questions
cannot be answered by FE, and it can only ever present a partial picture of the substantive
phenomenon represented by the model. With panel data, for example, FE models can say
nothing about relationships with independent variables that do not change over time—only
about deviations from the mean over time. FE models therefore “throw away important and
useful information about the relation between the explanatory and the explained variables
in a panel” (Nerlove 2005, p. 20).
If a researcher has no substantive interest in the between effects, their exclusion is per-
haps unimportant, though even in such a case, for reasons discussed below, we think there
are still reasons to disfavour the FE approach as the one and only valid approach. To be
clear the REWB and Mundlak will give exactly the same results for the within effect (coef-
ficient and standard error) as the FE model (see Bell and Jones 2015 for simulations; Goet-
geluk and Vansteelandt 2008 for proof of consistency), but retains the between effect which
can be informative and cannot be obtained from a FE model.
2.2.3 Single level OLS regression
An even simpler option is to ignore the structure of the model entirely:
Thus, we assume that all observations in the dataset are conditionally independent.
This has two problems. First, as with the standard RE model, the estimate of
𝛽OLS
1
will
be a potentially uninterpretable weighted average6 of the within and between effects (if
they are not equal). Furthermore, if there are differences between level 2 entities (that is,
if there are effects of unmeasured higher-level variables), standard errors will be estimated
(5)
yit
=𝛽1(x
it
̄x
i
)+
(
𝜐
i
+𝜖
it).
(6)
(
y
it
̄y
i
)=𝛽
1
(x
it
̄x
i
)+
(
𝜖
it).
(7)
y
it
=𝛽
0
+𝛽
OLS
1
x
it
+𝛽
OLS
4
z
i
+
(
𝜖
it)
6 This will actually be a different weighted average to that produced by RE: it is weighted by the proportion
of the variance in
xit
that exists at each level, so where the within-unit variance of
xit
is negligible, the esti-
mate will be close to that of the between effect, and vice versa. More formally,
𝛽SL
=
(
1𝜌
x)
𝛽
W
+𝜌
x
𝛽
B
,
where
𝜌x
is the proportion of the variance in
xit
occurring at the higher level.
5 Note though that, in the longitudinal setting, between effects will only be fully controlled if those effects
do not change over time (this is the case with the REWB/Mundlak models as well, unless such heterogene-
ity is explicitly modelled).
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1059
Fixed andrandom effects models: making aninformed choice
1 3
as if all observations are independent, and so will be generally underestimated, especially
for parameters associated with higher-level variables, including between and contex-
tual effects.7 Fortunately, the necessity of modelling the nested structure can readily be
evaluated, by running the model both with and without the higher-level random effects
and testing which is the better fitting model by a likelihood ratio test (Snijders and Bosker
2012:97), AIC, or BIC.
2.3 Omitted variable bias inthewithin‑between REmodel
We hope the discussion above has convinced readers of the superiority of the REWB
model, except perhaps when the within and between effects are approximately equal, in
which case the standard RE model (without separated within and between effects) might
be preferable for reasons of efficiency.8 Even then, the REWB model should be considered
first, or as an alternative, since the equality of the within and between coefficients should
not be assumed. As for FE, except for simplicity there is nothing that such models offer that
a REWB model does not.
All of the models we consider here are subject to a variety of biases, such as if there is
selection bias (Delgado-Rodríguez and Llorca 2004), or the direction of causality assumed
by the model is wrong (e.g. see Bell, Johnston, and Jones 2015). Most significantly for our
present purposes is the possibility of omitted variable bias.
As with fixed effects models, the REWB specification prevents any bias on level 1 coef-
ficients due to omitted variables at level 2. To put it another way, there can be no correla-
tion between level 1 variables included in the model and the level 2 random effects—such
biases are absorbed into the between effect, as confirmed by simulation studies (Bell and
Jones 2015; Fairbrother 2014). When using panel data with repeated measures on individu-
als, unchanging and/or unmeasured characteristics of an individual (such as intelligence,
ability, etc.) will be controlled out of the estimate of the within effect. However, unob-
served time-varying characteristics can still cause biases at level 1 in either an FE or a
REWB/Mundlak model. Similarly, in a REWB/Mundlak models, unmeasured level 2 char-
acteristics can cause bias in the estimates of between effects and effects of other level 2
variables.
This is a problem if we wish to know the direct causal effect of a level 2 variable: that
is, what happens to Y when a level 2 variable increases or decreases, such as because of an
intervention (Blakely and Woodward 2000). However, this does not mean that those esti-
mated relationships are worthless. Indeed, often we are not looking for the direct, causal
effect of a level 2 variable, but see these variables as proxies for a range of unmeasured
social processes, which might include those omitted variables themselves. As an example,
in a panel data structure when considering the relationship between ethnicity (an unchang-
ing, level 2 variable) and a dependent variable, we would not interpret any association
found to be the direct causal effect of any particular genes or skin pigmentation; rather we
7 One could add a group mean variable to this equation, as in Eqs.2 or 3. Whilst this would solve the issue
of bias of the point estimates, standard errors would still be underestimated.
8 This is not necessarily the case, however: if there are substantive reasons for suspecting that the processes
driving the two effects are different then it makes sense to use SEs that treat the processes as separate.
Moreover, it may be that subsequent elaboration of the model (addition of variables, etc.) would lead to
within and between effects diverging—researchers are best served by being cautious about combining the
two.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1060
A.Bell et al.
1 3
are interested in the effects of the myriad of unmeasured social and cultural factors that
are related to ethnicity. If a direct genetic effect is what we are looking for, then our esti-
mates are likely to be ‘biased’, but we hope most reasonable researchers would not inter-
pret such coefficients in this way. As long as we interpret any coefficient estimates with
these unmeasured variables in mind, and are aware that such reasoning is as much concep-
tual and theoretical as it is empirical, such coefficients can be of great value in helping us
to understand patterns in the world through a model-based approach. Note that if we are,
in fact, interested in a direct causal effect and are concerned by possible omitted variables,
then instrumental variable techniques can sometimes be employed within the RE frame-
work (for example, see Chatelain and Ralf 2018; Steele etal. 2007).
The logic above also applies to estimates of between and contextual effects. These
aggregated variables are proxies of group level characteristics that are to some extent
unmeasured. As such, it is not a problem in our view that, in the case of panel data, future
data is being used to form this variable and predict past values of the dependent varia-
ble—these values are being used to get the best possible estimate of the unchanging group-
level characteristic. If researchers want these variables to be more accurately measured,
they could be precision-weighted, to shrink them back to the mean value for small groups
(Grilli and Rampichini 2011; Shin and Raudenbush 2010).
3 Fixed andrandom eects: conceptualising therandom part
ofthemodel
This section aims to clarify further the statistical and conceptual differences between RE
(including REWB) and FE modelling frameworks. The obvious difference between the two
models is in the way that that the level-2 entities are treated: that is
𝜐i
in Eqs.2 and 5.
In a RE model (whether standard, REWB or Mundlak) level-2 random effects are
treated as random draws from a Normal distribution, the variance of which is estimated:
In contrast, a FE model treats level-2 entities as unconnected:
𝜐i
in Eq.5 are dummy
variables for higher-level entity i, each with separately estimated coefficients (less a refer-
ence category, or with the intercept suppressed). Because these dummy variables account
for all the higher-level variance, no other variables measured at the higher level can be
identified.
In both specifications, the level-1 variance is typically assumed to follow a Normal
distribution:
To us, this is what the ‘random’ and ‘fixed’ in RE and FE mean. In contrast, others
argue that the defining feature of the RE model is an assumption that that model makes.
Vaisey and Miles (2017:47) for example state:
The only difference between RE and FE lies in the assumption they make about the
relationship between υ [the unobserved time-constant fixed/random effects] and the
observed predictors: RE models assume that the observed predictors in the model are
not correlated with υ while FE models allow them to be correlated.
Such views are also characteristic of mainstream econometrics:
(8)
𝜐i
N
(
0, 𝜎2
𝜐).
(9)
𝜖it
N
(
0, 𝜎
2
𝜖)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1061
Fixed andrandom effects models: making aninformed choice
1 3
In modern econometric parlance, ‘‘random effect’’ is synonymous with zero cor-
relation between the observed explanatory variables and the unobserved effect …
the term ‘‘fixed effect’’ does not usually mean that ci [
𝜐i
in our notation] is being
treated as nonrandom; rather, it means that one is allowing for arbitrary correlation
between the unobserved effect ci and the observed explanatory variables xit. So, if
ci is called an ‘‘individual fixed effect’’ or a ‘‘firm fixed effect,’’ then, for practical
purposes, this terminology means that ci is allowed to be correlated with xit. (Wool-
dridge 2002:252)
No doubt this assumption is important (see Sect.2.3). But regardless of how well estab-
lished this definition is, it is misleading. This assumption is not the only difference between
RE and FE models, and is far from being either model’s defining feature.
The different distributional assumptions affect the extent to which information is con-
sidered exchangeable between higher-level entities: are they unrelated, or is the value of
one level-2 entity related to the values of the others? In the FE framework, nothing can be
known about each level-2 entity from any or all of the others—they are unrelated and each
exist completely independently. At the other extreme, a single-level model assumes there
are no differences between the higher-level entities, in a sense knowing one is sufficient to
know them all. RE models strike a balance between these two extremes, treating higher-
level entities as distinct but not completely unlike each other. In practice, the random inter-
cepts in RE models will correlate strongly with the fixed effects in a ‘dummy variable’ FE
models, but RE estimates will be drawn in or ‘shrunk’ towards their mean—with unreli-
ably estimated and more extreme values shrunk the most.
Why does it matter that the random effects are drawn from a common distribution? We
have already stated that FE models estimate coefficients on higher-level dummy variables
(the fixed effects), and cannot estimate coefficients on other higher-level variables (between
effects). RE models can yield estimates for coefficients on higher-level variables because
the random effects are parameterised as a distribution instead of dummy variables. More-
over, RE automatically provides an estimate of the level 2 variance, allowing an overall
measure of the extent to which level-2 entities differ in comparison to the level 1 variance.
Further, this variance can be used to produce ‘shrunken’ (or ‘Empirical Bayes’) higher-
level residuals which, unlike FE dummy-variable parameter estimates, take account of the
unreliability of those estimates; for an application, see Ard and Fairbrother (2017). The
degree of “shrinkage” (or exchangeability across level 2 entities) in a RE model is deter-
mined from the data, with more shrinkage if there are few observations and/or the esti-
mated variance of the level-2 entities,
𝜎2
𝜐
, is small (see Jones and Bullen 1994; Spiegelhal-
ter 2004).
If we are interested in whether individuals’ responses are related to their specific con-
texts (neighbourhoods, schools, countries, etc.) a fixed effects model can help answer this
question if dummy variables for level-2 entities are estimated, but this is done unreliably
with small level-2 entities. A RE model can give us more reliable, appropriately conserva-
tive estimates of this (Bell etal. 2018), as well as telling us whether that context matters
in general, based on the significance of the estimated variance of the random effects.9 It
can tell us both differences in higher-level effects (termed ‘type A’ effects in the education
9 This could also be done on the basis of a Wald test of the joint significance of FE dummy variables, but
this is not possible with non-linear outcomes where dummy coefficients are not estimated.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1062
A.Bell et al.
1 3
literature, Raudenbush and Willms 1995) and the effects of variables at the higher level
(‘type B’ effects). FE estimators cannot estimate the latter.
The view of FE and RE being defined by their assumptions has led many to character-
ise the REWB model as a ‘hybrid’ between FE and RE, or even a ‘hybrid FE’ model (e.g.
Schempf etal. 2011). We hope the discussion above will convince readers that this model
is a RE model. Indeed, Paul Allison, who (we believe) introduced the terminology of the
Hybrid model (Allison 2005, 2009) now prefers the terminology of ‘within-between RE’
(Allison 2014).
The label matters, because FE models (and indeed ‘hybrid’ models) are often presented
as a technical solution, following and responding to a Hausman test taken to mean that a
RE model cannot be used.10 As such, researchers rarely consider what problem FE actually
solves, and why the RE parameter estimates were wrong. This bias is often described as
‘endogeneity’, a term that covers a wide and disparate range of different model misspeci-
fications (Bell and Jones 2015:138). In fact, the Hausman test simply investigates whether
the between and within effects are different—a possibility that the REWB specification
allows for. REWB (a) recognises the possibility of differences between the within and
between effects of a predictor, and (b) explicitly models those separate within and between
effects. The REWB model is a direct, substantive solution to a mis-specified RE model in
allowing for the possibility of different relations at each level; it models between effects,
which may be causing the problem, and are often themselves substantively interesting.
When treated as a FE model, this substance is often lost.
Further, using the REWB model as if it were a FE model leads researchers to use it
without taking full advantage of the benefits that RE models can offer. The RE framework
allows a wider range of research questions to be investigated: involving time-invariant vari-
ables, shrunken random effects, additional hierarchical (e.g. geographical) levels and, as
we discuss in the next section, random slopes estimates that allow relationships to vary
across individuals, or allow variances at any level to vary with variables. As well as yield-
ing new, substantively interesting results, such actions can alter the average associations
found. Describing the REWB, or Hybrid, model as falling under a FE framework therefore
undersells and misrepresents its value and capabilities.
4 Modelling more complexity: random slopes models andthree‑level
models
4.1 Random slopes models
So far, all models have assumed homogeneity in the within effect associated with
xit
.
This is often a problematic assumption. First, such models hide important and interest-
ing heterogeneity. And second, estimates from models that assume homogeneity incor-
rectly will suffer from biased estimates, as we show below. The RE/REWB model as
previously described also suffers from this shortcoming, but can more easily avoid it by
explicitly modelling such heterogeneity, with the inclusion of random slopes (Western
10 Many (e.g. Greene 2012:421) even argue that the Mundlak or REWB model can be used as a form of the
Hausman test, which could be itself be used to justify the use of FE, even though the REWB model makes
that choice unnecessary.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1063
Fixed andrandom effects models: making aninformed choice
1 3
1998). These allow the coefficients on lower-level covariates to vary across level-2 enti-
ties. Equation2 then becomes:
Here
𝛽1W
is a weighted average (Raudenbush and Bloom 2015) of the within effects
in each level-2 entity;
measures the extent to which these within effects vary between
level-2 entities (such that each level-2 entity i has a within effect estimated as
𝛽1W+𝜐i1
).
The two random terms
and
𝜐i0
are assumed to be draws from a bivariate Normal dis-
tribution, meaning Eq.8 is extended to:
The meaning of individual coefficients can vary depending on how variables are
scaled and centred. However, the covariance term indicates the extent of ‘fanning in
(with negative
𝜎𝜐01
) or ‘fanning out’ (positive
𝜎𝜐01
) from a covariate value of zero (Bul-
len etal. 1997). In many cases, there is substantive heterogeneity in the size of associa-
tions among level-2 entities. Table2 shows two examples of reanalyses where including
random coefficients makes a real difference to the results. Both are analyses of coun-
tries, rather than individuals, but the methodological issues are similar. The first is a
reanalysis of an influential study in political science (Milner and Kubota 2005) which
claims that political democracy leads to economic globalisation (measured by countries’
tariff rates). When including random coefficients in the model, not only does the overall
within effect disappear, but a single outlying country, Bangladesh, turns out to be driv-
ing the relationship (Bell and Jones 2015, Appendix). The second example is the now
infamous study in economics by Reinhart and Rogoff (2010), which claimed that higher
levels of public debt cause lower national economic growth (a conclusion that remained
even after the Herndon etal. (2014) corrections). In this case, although the coefficient
does not change with the introduction of random slopes, the standard error triples in
size, and the within effect is no longer statistically significant when, in addition, time is
appropriately controlled (Bell etal. 2015).
(10)
yit =𝛽0+𝛽1W(xit ̄xi)+𝛽2B̄xi+𝛽3zi+𝜐i0+𝜐i1(xit ̄xi)+𝜖it
(11)
[
𝜐i0
𝜐
i
1
]
N
(
0,
[
𝜎
2
𝜐0
𝜎
𝜐
01 𝜎2
𝜐1])
Table 2 Results from reanalyses of Milner and Kubota (2005) and Reinhart and Rogoff (2010)
Standard errors are in parentheses
For full details of the models used, see the reanalysis papers themselves
NS not significant
P values *** < 0.001; ** < 0.01; * < 0.05
Original study/studies Milner and Kubota (2005) Reinhart and Rogoff
(2010) and Herndon
etal. (2014)
Reanalysis Bell and Jones (2015) (appendix) Bell etal. (2015)
Dependent variable Tariff rates Economic growth (ΔGDP)
Independent variable of interest Democracy (polity score) National debt (%GDP)
REWB/FE within estimate (SE) − 0.227 (0.086)** − 0.021 (0.003)***
Random slopes estimate (SE) 0.143 (0.187) (NS) − 0.021 (0.009)*
Notes Effect further reduced by the removal of a
single outlying country, Bangladesh.
Effect becomes insignifi-
cant when time is appro-
priately controlled.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1064
A.Bell et al.
1 3
In both cases, not only is substantively interesting heterogeneity missed in models
assuming homogenous associations, but also within effects are anticonservative (that is,
SEs are underestimated). Leaving aside the substantive interest that can be gained from
seeing how different contexts can lead to different relationships, failing to consider how
associations differ across level-2 entities can produce misleading results if such differences
exist. Although such heterogeneity can be modelled in a FE framework with the addition
of multiple interaction terms, it rarely is in practice, and that heterogeneity does not ben-
efit from shrinkage as in the RE framework. Thus, a FE model can lead an analyst to miss
problematic assumptions of homogeneity that the model is making. A RE model—includ-
ing the REWB model—allows for the modelling of important complexities, such as hetero-
geneity across level-2 entities.
We further demonstrate this using a simulation study. We simulated data sets with:
either 60 groups of 10, or 30 groups of 20; random intercepts distributed Normally, Chi
square, Normally but with a single large outlier, or with unbalanced groups; with only ran-
dom intercepts, or both random intercepts and random slopes; and with y either Normal or
binary (logit). This produced 32 data-generating processes (DGPs) in total. We then fitted
three different models to each simulated dataset: FE, random intercept, and random slope.
For the FE models, we calculated both naive and robust SEs.
Figure1 shows the ‘optimism’—the ratio of the true sampling variability to the sam-
pling variability estimated by the standard error (see Shor etal. 2007)—for a single covari-
ate, in a variety of scenarios.11 In the scenarios presented in the top row, the DGP included
only random intercepts, not random slopes; the lower row represents DGPs with both
Fig. 1 Optimism of the standard errors in various models. Note Triangles are for logistic models, circles for
Normal models; blue means 60 groups of ten, red 30 groups of 20. (Color figure online)
11 See the “Appendix” of the present paper for the full explanation and R code to replicate these simula-
tions in ESM.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1065
Fixed andrandom effects models: making aninformed choice
1 3
random intercepts and random slopes. FE models are in the first two columns (with naïve
and robust standard errors), random-intercepts models the third column, and random slopes
models in the right-hand column.
Figure1 shows that where random slopes are not included in the analysis model (all but
the right-most column), but exist in the data in reality (bottom row), the standard errors
are overoptimistic—they are too small relative to the true sampling variability. When there
is variation in the slopes across level-2 entities, there is more uncertainty in the beta esti-
mates, but this is not reflected in the standard error estimates unless those random slopes
are explicitly specified. In the top row, in contrast, all four columns look the same: here
there is no mismatch between the invariant relationships assumed by the analysis models
and present in the data. In the presence of heterogeneity, note that while FE models with
naive SEs are the most anticonservative, neither FE models with “robust” standard errors
nor RE models with only random intercepts are much better.
These results support the strong critique by Barr etal. (2013) that not to include random
slopes is anticonservative. On the other hand, Matuschek etal. (2017) counter that ana-
lytical models should also be parsimonious, and fitting models with many random effects
quickly multiplies the number of parameters to be estimated, particularly since random
slopes are generally given covariances as well as variances. Sometimes the data available
will not be sufficient to estimate such a model. Still, it will make sense in much applied
work to test whether a statistically significant coefficient remains so when allowed to vary
randomly. We discuss this further in the conclusions.
4.2 Three (and more) levels, andcross‑classications
Datasets often have structures that span more than two levels. A further advantage of the
multilevel/random effects framework over fixed effects is its allowing for complex data
structures of this kind. Fixed effects models are not problematic when additional higher
levels exist (insofar as they can still estimate a within effect), but they are unable to include
a third level (if the levels are hierarchically structured), because the dummy variables at the
second level will automatically use up all degrees of freedom for any levels further up the
hierarchy. Multilevel models allow competing explanations to be considered, specifically at
which level in a hierarchy matters the most, with a highly parsimonious specification (esti-
mating a variance parameter at each level).12
For example, cross-national surveys are increasingly being fielded multiple times in the
same set of countries, yielding survey data that are both comparative and longitudinal. This
presents a three-level hierarchical structure, with observations nested within country-years,
which are in turn nested in countries (Fairbrother 2014).13
12 The capability of analysing at multiple scales net of other scales can be exploited in a model- based
approach to segregation where the variance at a scale conveys the degree of segregation (Jones etal. 2015).
13 See Schmidt-Catran and Fairbrother (2015) for a further extension that includes a cross-classified year or
wave level.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1066
A.Bell et al.
1 3
4.3 Complex level 1 heterogeneity
A final way in which the random part of the model can be expanded is by allowing the
variance at level 1 to be structured by one or more covariates at any level. Thus, Eq.10 is
extended to:
where the level 1 variance has two parts, one independent and the other related to
(xit ̄xi)
.
Equation9 is extended to:
Often this is important to do, because what is apparent higher-level variance14 between
level-2 entities, can in fact be complex variance at level 1. It is only by specifying both,
as in Eq. 12, that we can be sure how variance, and varying variance, can be attributed
between levels (Vallejo etal. 2015).15
5 Generalising theREmodel: binary andcount dependent variables
So far, this paper has considered only models with continuous dependent variables, using
an identity link function. Do the claims of this paper apply to Generalised Linear mod-
els? These include other dependent variables and link functions (Neuhaus and McCull-
och 2006), such as logit and probit models (for binary/proportion dependent variables) and
Poisson models (for count dependent variables). Although this question has not been con-
sidered to a great extent in the social and political sciences, the biostatistics literature does
provide some answers (for an accessible discussion of this, see Allison 2014). Here we
briefly outline some of the issues.
Unlike models using the identity link function, results using the REWB model with
other link functions do not produce results that are identical to FE (or the equivalent con-
ditional likelihood model). In other words, the inclusion of the group mean in the model
does not reliably partition any higher-level processes from the within effect, meaning both
within and between estimates of cluster-specific effects16 can be biased. This is the case
when the relationship between the between component of X (
̄xi
) and the higher-level resid-
ual (
𝜐i
) is non-linear. How big a problem is this? Brumback etal. (2010:1651) found that,
in running simulations, “it was difficult to find an example in which the problem is severe”
(12)
yit =𝜇+𝛽1(xit ̄xi)+𝛽2̄xi+𝛽3zi+𝜐i0+𝜐i1(xit ̄xi)+𝜖it0+𝜖it1(xit ̄xi),
(13)
[
𝜖it0
𝜖
it1]
N
(
0,
[
𝜎
2
𝜖0
𝜎
𝜖01
𝜎2
𝜖1])
14 Note the random slopes described in 4.1 can also be conceived as varying variance. Variance could vary
by both level 1 and level 2 variables. The approach used here is standard in the multilevel literature (Gold-
stein 2010), but other approaches are possible (for example modelling the log of the variance as a function
of covariates - e.g. see Hedeker and Mermelstein 2007).
15 Although difficult to implement in some standard software packages (it cannot be implemented in the
mixed package in Stata, or lme4 in R), it can be implemented in MLwiN, which can in turn be accessed
from Stata/R using the packages runmlwin/R2MLwiN (Leckie and Charlton 2013; Zhang etal. 2016).
16 Note: we do not consider the differences between population average and cluster specific estimates in
this paper—all models considered in this section of the paper produce the latter. This debate is beyond
the scope of the paper (but see Jones and Subramanian 2013; Subramanian and O’Malley 2010 for more
on this). Both cluster specific and population average estimates may be needed depending on the research
question; this is not a debate that can or should be technically resolved.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1067
Fixed andrandom effects models: making aninformed choice
1 3
(see also Goetgeluk and Vansteelandt 2008). In a later paper, however, Brumback et al.
(2013) did identify one such example, but only with properties unlikely to be found in real
life data (Allison 2014)—
̄xi
and
𝜐i
very highly correlated, and few observations per level-2
entity.
Whether the REWB model should be used, or a conditional likelihood (FE) model
should be used instead, depends on three factors: (1) the link function, (2) the nature of the
research question, and (3) the researcher’s willingness to accept low levels of bias. Regard-
ing (1), many link functions, including negative binomial models, ordered logit models,
and probit models, do not have a conditional likelihood estimator associated with them. If
such models are to be used, the REWB model may be the best method available to produce
within effects that are (relatively) unbiased by omitted higher-level variables. Regarding
(2), conditional likelihood methods have all the disadvantages of FE mentioned above; they
are unable to provide level-2 effects, random slopes cannot be fitted, and so on, meaning
there is a risk of producing misleading and anti-conservative results. These will often be
important to the research question at hand, to provide a realistic level of complexity to the
modelling of the scenarios at hand. The level of bias is easily ascertained by comparing
the estimate of the REWB model to that of the conditional likelihood model (where avail-
able). If the results are deemed similar enough, the researcher can be relatively sure that the
results produced by the REWB model are likely to be reasonable.
6 Assumptions ofrandom eects models: howmuch dothey matter?
A key assumption of RE models is that the random effects representing the level-2 enti-
ties are drawn from a Normal distribution. However, “the Normality of [the random coeffi-
cients] is clearly an assumption driven more by mathematical convenience than by empiri-
cal reality” (Beck and Katz 2007:90). Indeed, it is often an unrealistic assumption, and it is
important to know the extent to which different estimates are biased when that assumption
is broken.
The evidence from prior simulations studies is somewhat mixed, and depends on what
specifically in the RE model is of interest. For linear models with a continuous response
variable, and on the positive side, Beck and Katz (2007) find that both average parameter
estimates and random effects are well estimated, both when the random effects are assumed
to be Normally distributed but in fact have a Chi square distribution, or there are a number
of outliers in the dataset.17 Others concur that beta estimates are generally unbiased by
non-Normal random effects, as are estimates of the random effects variances (Maas and
Hox 2004; McCulloch and Neuhaus 2011a). Random effects are only biased to a significant
degree in extreme scenarios (McCulloch and Neuhaus 2011b), and even then (for example
for random effects with a Chi square(1) distribution), the ranked order of estimated ran-
dom effects remains highly correlated (Correlation > 0.8) to the rankings of the true ran-
dom effects (Arpino and Varriale 2010), meaning substantive interpretation is likely to be
affected only minimally. This is the case whether or not the DGP includes random slopes.
In other words, a badly specified random distribution may result in some biases, but these
are usually small enough not to worry the applied researcher. If there is a concern about
17 In the latter case, outlying random effects can easily be identified and ‘dummied out’, allowing the distri-
bution of the rest of the random effects to be estimated.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1068
A.Bell et al.
1 3
bias, it may be wise to check the findings are robust to other specifications, and potentially
use models that allow for non-Normal random effects, such as Non-Parametric Maximum
Likelihood techniques (Aitkin 1999; Fotouhi 2003).
With non-linear models, the evidence is somewhat less positive. Where the Normality
assumption of the higher-level variance is violated, there can be significant biases, particu-
larly when the true level 2 variance is large (as is often the case with panel data, but not in
cross-sectional data (Heagerty and Kurland 2001). For a review of these simulation stud-
ies, see Grilli and Rampichini (2015).
Our simulations, for the most part, back up these findings and this is illustrated in
Fig. 2, which presents the consequences for various parameters if the random intercepts
have a Chi square(2) distribution, or have a single substantial outlier, and if the groups
are unbalanced. First, beta estimates are unbiased (upper-left panel), as are their standard
errors (upper-right), regardless of the true distribution of the random effects and the type
of model. Non-Normality does however have consequences for the estimate of the level-2
Fig. 2 Biases and RMSE under various (mis-)specifications. Note Triangles are for logistic models, circles
for Normal models; blue means 60 groups of ten—red 30 groups of 20. Clockwise from the upper-left, the
parameters are beta (bias), optimism of the standard errors (bias), random intercepts (RMSE), and level-2
variance (bias). (Color figure online)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1069
Fixed andrandom effects models: making aninformed choice
1 3
variance (lower-left panel). When the true distribution is skewed (in a Chi square(2) dis-
tribution), for logistic models there is notable downward bias in the estimate of the level-
two variance, and a slight increase in the error associated with the random effects them-
selves (lower-right). We found no evidence of any similar bias in models with a continuous
response. In contrast, when the non-Normality of the random effects is due to an outlying
level-2 entity, there is an impact on the estimated variance for models with a continuous
response, and the estimated random intercepts for both logistic and Normal models. How-
ever, as noted above, the latter does not need to be problematic, because outliers can be
easily identified and ‘dummied out’, effectively removing that specific random effect from
the estimated distribution. Note that the high RMSE associated with unbalanced datasets
(lower-right) is related to the smaller sample size in some level 2 groups, rather than being
evidence of any bias.
In sum, even substantial violations of the Normality assumption of the higher-level ran-
dom effects do not have much impact on estimates in the fixed part of the model, nor the
standard errors. Such violations can however affect the random effects estimates, particu-
larly in models with a non-continuous response.
7 Conclusion: what should researchers do?
We hope that this article has presented a clear picture of the key properties, capabilities,
and limitations of FE and RE models, including REWB models. We have considered what
each of these models are, what they do, what they assume, and how much those assump-
tions matter in different real-life scenarios.
There are a number of practical points that researchers should take away from this
paper. First and perhaps most obviously is that the REWB model is a more general and
encompassing option than either FE or conventional RE, which do not distinguish between
within and between effects. Even when using non-identity link functions, or when the Nor-
mality assumption of the random effects is violated, the small biases that can arise in such
models will often be a price worth paying for the added flexibility that the REWB model
provides. This is especially the case since FE is unable to provide any estimates at all of the
parameters that are most biased by violations of Normality (specifically random effects and
variance estimates). The only reason to choose FE is if (1) higher-level variables are of no
interest whatsoever, (2) there are no random slopes in the true DGP, or (3) there are so few
level-2 entities that random slopes are unlikely to be estimable. Regarding (1) we would
argue this is rarely the case in social science, where a full understanding of the world and
how it operates is often the end goal. Regarding (2), testing this requires fitting a RE model
in any case, so the benefits of reverting to FE are moot. Regarding (3), the REWB model
will still be robust for fixed-part parameter estimates (although maximum likelihood esti-
mation may be biased—McNeish 2017; Stegmueller 2013), though it’s efficacy relative to
FE would be very limited, since higher level parameters would be estimated with a lot of
uncertainty.
Second, the question of whether to include random slopes is important and requires
careful consideration. On the one hand, in a world of limited computing power and limited
data, it is often not feasible to allow the effects of all variables to vary between level-2
entities. On the other hand, we have shown that results can change in substantive ways
when slopes are allowed to vary randomly. We would argue that, at the least, where there
is a single substantive predictor variable of interest, it would make sense to check that the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1070
A.Bell et al.
1 3
conclusions hold when the effect of that variable is allowed to vary across clusters. One
option in this regard is to use robust standard errors, not as a correction per se, but as
a diagnostic procedure—a ‘canary down the mine’—following King and Roberts (2015).
Any difference between conventional and robust standard errors suggests there is some
kind of misspecification in the model, and that misspecification might well include the
failure to model random slopes. The two leftmost panels in the lower row in Fig.1 show
precisely how robust standard errors will differ when a model is mis-specified in omitting
relevant random effects.
Third, and in contrast to much of the applied literature, we argue that researchers should
not use a Hausman test to decide between fixed and random effects models. Rather, they
can use this test, or models equivalent to it, to verify the equivalence of the within and
between relationships. A lack of equality should be in itself of interest and worthy of fur-
ther investigation through the REWB model.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
Appendix: The simulations
We generated datasets according to the formula
or in other words with random intercepts only, and also according to
In this latter case, the data-generated process (DGP) included both random intercepts
and random slopes, and these random effects were distributed according to
That is, the random effects were in all cases uncorrelated. We also generated binary data
based on similar models (both random intercept-only and random intercept, random slope
models), using a logit link. In all cases,
𝜎2
𝜐0
and
𝜎2
𝜐1
were set to 4, and (for the Normally dis-
tributed data) the variance of
𝜖it
to 1. The overall intercept
𝛽0
and the overall slope
𝛽1
were
also set to 1. The
xit
data were drawn from a Normal distribution with a mean of 0 and a
variance of 0.25^2.
We fitted models to simulated data sets with either 60 groups of 10 or 30 groups of
20, yielding a total N of 600 either way.18 The 30 × 20 condition reflected that time-series
cross-sectional datasets often possess roughly those N’s at each level, and that many cross-
national survey datasets include about 30 countries. The 60x10 condition allowed for a
useful contrast testing the implications of varying the N at either level. We did not conduct
simulations with groups larger than 20 because of the high time costs of doing so, and
yit =𝛽0+𝛽1xit +𝜐i0+𝜖it,
yit =𝛽0+𝛽1xit +𝜐i0+𝜐i1xit +𝜖it.
[
𝜐i0
𝜐
i1]
N
(
0,
[
𝜎
2
𝜐0
0𝜎2
𝜐1]).
18 The N’s at each level are not typical of published studies using multilevel models. But most studies use
large N’s that would have made the simulation studies much more time-consuming to run, with no benefit
in terms of insights.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1071
Fixed andrandom effects models: making aninformed choice
1 3
because previous simulation studies have not revealed anything particularly notable about
studies conducted with large rather than small groups (Bryan and Jenkins 2016; Schmidt-
Catran and Fairbrother 2015).
In some cases, instead of drawing the
𝜐i0
’s from a Normal distribution, we drew them
from a Chi squared distribution, or from a Normal distribution but with a single large out-
lier. Where they were drawn from a Chi squared distribution, the distribution’s degrees of
freedom was set at 2, and we also subtracted 2 from each randomly drawn value, yielding
a final population mean of 0 and variance of 4—the same as in scenarios where the
𝜐i0
’s
were drawn from a Normal distribution. For the scenarios with the outlier, we tripled the
value of the element of
𝜐i0
with the largest absolute value.
As a fourth possibility, we made the simulated dataset unbalanced, by resampling with
replacement a dataset of the same total size from the values of the original, with equal
probability of selection. This yielded groups of randomly varying sizes.
In sum, under each of these four conditions (Normal, Chi squared, outlier, unbalanced),
we simulated datasets using only random intercepts or both random intercepts and random
slopes, with y either Normal or binary, and with one combination of N’s or the other—
yielding 32 distinct DGPs (4 × 2 × 2 × 2). We conducted 1000 simulations with each DGP.
We then fitted three different models to each simulated dataset: a fixed effects model
(with naïve and clustered standard errors), a random intercepts-only model, and a random
intercepts-random slopes model.
We conducted the simulations in R. For fitting multilevel models we used the package
lme4 (Bates etal. 2015). For deriving clustered standard errors from the fixed effects mod-
els, we used the plm package (Croissant and Millo 2008). We caught false or questionable
convergences and simply removed them, simulating a new dataset instead (this should not
bias the results, although it should be noted as an advantage of FE is that it is unlikely to
show convergence problems due to being estimated by OLS). We tried multiple runs of
simulations, and found stable results beyond about 200 simulations per DGP.
References
Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models.
Biometrics 55(1), 117–128 (1999)
Allison, P.D.: Using panel data to estimate the effects of events. Sociol. Methods Res. 23(2), 174–199
(1994)
Allison, P.D.: Fixed Effects Regression Methods for Longitudinal Data using SAS. SAS Press, Cary, NC
(2005)
Allison, P.D.: Fixed Effects Regression Models. Sage, London (2009)
Allison, P.D.: Problems with the hybrid method. Stat. Horiz. http://www.stati stica lhori zons.com/probl ems-
with-the-hybri d-metho d (2014). Accessed 16 July 2015
Ard, K., Fairbrother, M.: Pollution prophylaxis? social capital and environmental inequality*. Soc. Sci. Q.
98(2), 584–607 (2017)
Arpino, B., Varriale, R.: Assessing the quality of institutions’ rankings obtained through multilevel linear
regression models. J. Appl. Econ. Sci. 5(1), 7–22 (2010)
Barr, D.J., Levy, R., Scheepers, C., Tily, H.J.: Random effects structure for confirmatory hypothesis testing:
keep it maximal. J. Mem. Lang. 68(3), 255–278 (2013)
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat.
Softw. 67(1), 1–48 (2015)
Beck, N., Katz, J.N.: What to do (and not to do) with time-series cross-section data. Am. Polit. Sci. Rev.
89(3), 634–647 (1995)
Beck, N., Katz, J.N.: Random coefficient models for time-series-cross-section data: Monte Carlo experi-
ments. Polit. Anal. 15(2), 182–195 (2007)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1072
A.Bell et al.
1 3
Bell, A., Jones, K.: Explaining fixed effects: random effects modelling of time-series cross-sectional and
panel data. Polit. Sci. Res. Methods 3(1), 133–153 (2015)
Bell, A., Johnston, R., Jones, K.: Stylised fact or situated messiness? The diverse effects of increasing debt
on national economic growth. J. Econ. Geogr. 15(2), 449–472 (2015)
Bell, A., Jones, K., Fairbrother, M.: Understanding and misunderstanding group mean centering: a commen-
tary on Kelley etal’.s dangerous practice. Qual. Quant. 52(5), 2031–2036 (2018)
Bell, A., Holman, D., Jones, K.: Using shrinkage in multilevel models to understand intersectionality: a
simulation study and a guide for best practice (2018) (in review)
Blakely, T.A., Woodward, A.J.: Ecological effects in multi-level studies. J. Epidemiol. Community Health
54(5), 367–374 (2000)
Brumback, B.A., Dailey, A.B., Brumback, L.C., Livingston, M.D., He, Z.: Adjusting for confounding by
cluster using generalized linear mixed models. Stat. Probab. Lett. 80(21–22), 1650–1654 (2010)
Brumback, B.A., Zheng, H.W., Dailey, A.B.: Adjusting for confounding by neighborhood using generalized
linear mixed models and complex survey data. Stat. Med. 32(8), 1313–1324 (2013)
Bryan, M.L., Jenkins, S.P.: Multilevel modelling of country effects: a cautionary tale. Eur. Sociol. Rev.
32(1), 3–22 (2016)
Bullen, N., Jones, K., Duncan, C.: Modelling complexity: analysing between-individual and between-place
variation—a multilevel tutorial. Environ. Plann. A 29(4), 585–609 (1997)
Chatelain, J.-B., Ralf, K.: Inference on time-invariant variables using panel data: a pre-test estimator with an
application to the returns to schooling. PSE Working Paper. https ://ideas .repec .org/p/hal/wpape r/halsh
s-01719 835.html (2018). Accessed 24 Apr 2018
Christmann, P.: Economic performance, quality of democracy and satisfaction with democracy. Electoral.
Stud. 53, 79–89 (2018). https ://doi.org/10.1016/J.ELECT STUD.2018.04.004
Clark, T.S., Linzer, D.A.: Should I use fixed or random effects? Polit. Sci. Res. Methods 3(2), 399–408
(2015)
Croissant, Y., Millo, G.: Panel data econometrics in R: the plm package. J. Stat. Softw. 27(2), 1–43 (2008)
Deeming, C., Jones, K.: Investigating the macro determinants of self-rated health and well-being using the
European social survey: methodological innovations across countries and time. Int. J. Sociol. 45(4),
256–285 (2015)
Delgado-Rodríguez, M., Llorca, J.: Bias. J. Epidemiol. Community Health 58(8), 635–641 (2004)
Duncan, C., Jones, K., Moon, G.: Health-related behaviour in context: a multilevel modelling approach.
Soc. Sci. Med. 42(6), 817–830 (1996)
Fairbrother, M.: Two multilevel modeling techniques for analyzing comparative longitudinal survey data-
sets. Polit. Sci. Res. Methods 2(1), 119–140 (2014)
Fairbrother, M.: Trust and public support for environmental protection in diverse national contexts. Sociol.
Sci. 3, 359–382 (2016). https ://doi.org/10.15195 /v3.a17
Fielding, A.: The role of the Hausman test and whether higher level effects should be treated as random or
fixed. Multilevel Model. Newsl. 16(2), 3–9 (2004)
Fotouhi, A.R.: Comparisons of estimation procedures for nonlinear multilevel models. J. Stat. Softw. 8(9),
1–39 (2003)
Gelman, A.: Red State, Blue State, Rich State, Poor State : Why Americans Vote the Way They Do. Prince-
ton University Press, Princeton (2008)
Gelman, A.: Why I don’t use the term “fixed and random effects”. Stat. Model. Causal Inference Soc. Sci.
http://andre wgelm an.com/2005/01/25/why_i_dont_use/ (2005). Accessed 19 Nov 2015
Goetgeluk, S., Vansteelandt, S.: Conditional generalized estimating equations for the analysis of clustered
and longitudinal data. Biometrics 64(3), 772–780 (2008)
Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, Chichester (2010)
Greene, W.H.: Econometric Analysis, 7th edn. Pearson, Harlow (2012)
Grilli, L., Rampichini, C.: The role of sample cluster means in multilevel models: a view on endogeneity
and measurement error issues. Methodology 7(4), 121–133 (2011)
Grilli, L., Rampichini, C.: Specification of random effects in multilevel models: a review. Qual. Quant.
49(3), 967–976 (2015)
Halaby, C.N.: Panel models in sociological research: theory into practice. Ann. Rev. Sociol. 30(1), 507–544
(2004)
Hanchane, S., Mostafa, T.: Solving endogeneity problems in multilevel estimation: an example using educa-
tion production functions. J. Appl. Stat. 39(5), 1101–1114 (2012)
Hausman, J.A.: Specification tests in econometrics. Econometrica 46(6), 1251–1271 (1978)
Heagerty, P.J., Kurland, B.F.: Misspecified maximum likelihood estimates and generalised linear mixed
models. Biometrika 88(4), 973–985 (2001)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1073
Fixed andrandom effects models: making aninformed choice
1 3
Hedeker, D., Mermelstein, R.J.: Mixed-effects regression models with heterogeneous variance: Analyzing
ecological momentary assessment (EMA) data of smoking. In: Little, T.D., Bovaird, J.A., Card, N.A.
(eds.) Modeling Contextual Effects in Longitudinal Studies. Erlbaum, Mahwah, NJ (2007)
Herndon, T., Ash, M., Pollin, R.: Does high public debt consistently stifle economic growth? A critique
of Reinhart and Rogoff. Camb. J. Econ. 38(2), 257–279 (2014)
Howard, A.L.: Leveraging time-varying covariates to test within- and between-person effects and inter-
actions in the multilevel linear model. Emerg. Adulthood 3(6), 400–412 (2015)
Jones, K., Bullen, N.: Contextual models of urban house prices—a comparison of fixed-coefficient and
random-coefficient models developed by expansion. Econ. Geogr. 70(3), 252–272 (1994)
Jones, K., Subramanian, S.V.: Developing Multilevel Models for Analysing Contextuality, Heterogeneity
and Change, vol. 2. University of Bristol, Bristol (2013)
Jones, K., Johnston, R., Manley, D., Owen, D., Charlton, C.: Ethnic residential segregation: a multilevel,
multigroup, multiscale approach exemplified by London in 2011. Demography 52(6), 1995–2019
(2015)
King, G., Roberts, M.: How robust standard errors expose methodological problems they do not fix.
Polit. Anal. 23(2), 159–179 (2015)
Kloosterman, R., Notten, N., Tolsma, J., Kraaykamp, G.: The effects of parental reading socialization
and early school involvement on children’s academic performance: a panel study of primary school
pupils in the Netherlands. Eur. Sociol. Rev. 27(3), 291–306 (2010)
Lauen, D.L., Gaddis, S.M.: Exposure to classroom poverty and test score achievement: contextual effects
or selection? Am. J. Sociol. 118(4), 943–979 (2013)
Leckie, G., Charlton, C.: runmlwin: a program to run the MLwiN multilevel modelling software from
within Stata. J. Stat. Softw. 52(11), 1–40 (2013). https ://doi.org/10.18637 /jss.v052.i11
Maas, C.J.M., Hox, J.J.: Robustness issues in multilevel regression analysis. Stat. Neerl. 58(2), 127–137
(2004)
Maimon, D., Kuhl, D.C.: Social control and youth suicidality: situating durkheim’s ideas in a multilevel
framework. Am. Sociol. Rev. 73(6), 921–943 (2008)
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., Bates, D.M.: Balancing type I error and power in
linear mixed models. J. Mem. Lang. 94, 305–315 (2017)
McCulloch, C.E., Neuhaus, J.M.: Misspecifying the shape of a random effects distribution: why getting
it wrong may not matter. Stat. Sci. 26(3), 388–402 (2011a)
McCulloch, C.E., Neuhaus, J.M.: Prediction of random effects in linear and generalized linear models
under model misspecification. Biometrics 67(1), 270–279 (2011b)
McNeish, D.: Small sample methods for multilevel modeling: a colloquial elucidation of REML and the
Kenward–Roger correction. Multivar. Behav. Res. 52(5), 661–670 (2017)
Milner, H.V., Kubota, K.: Why the move to free trade? Democracy and trade policy in the developing
countries. Int. Org. 59(1), 107–143 (2005)
Mundlak, Y.: Pooling of time-series and cross-section data. Econometrica 46(1), 69–85 (1978)
Nerlove, M.: Essays in Panel Data Econometrics. Cambridge University Press, Cambridge (2005)
Neuhaus, J.M., McCulloch, C.E.: Separating between- and within-cluster covariate effects by using con-
ditional and partitioning methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 68, 859–872 (2006)
Rasbash, J.: Module 4: multilevel structures and classifications. LEMMA VLE. http://www.brist ol.ac.
uk/media -libra ry/sites /cmm/migra ted/docum ents/4-conce pts-sampl e.pdf (2008). Accessed 19 Nov
2015
Raudenbush, S.W., Bloom, H.S.: Learning about and from a distribution of program impacts using mul-
tisite trials. Am. J. Eval. 36(4), 475–499 (2015)
Raudenbush, S.W., Bryk, A.: Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd
edn. Sage, London (2002)
Raudenbush, S.W., Willms, J.: The estimation of school effects. J. Educ. Behav. Stat. 20(4), 307–335 (1995)
Reinhart, C.M., Rogoff, K.S.: Growth in a time of Debt. Am. Econ. Rev. 100(2), 573–578 (2010)
Ruiter, S., van Tubergen, F.: Religious attendance in cross-national perspective: a multilevel analysis of
60 countries. Am. J. Sociol. 115(3), 863–895 (2009)
Sampson, R.J., Raudenbush, S.W., Earls, F.: Neighborhoods and violent crime: a multilevel study of col-
lective efficacy. Science 277(5328), 918–924 (1997)
Schempf, A.H., Kaufman, J.S., Messer, L., Mendola, P.: The neighborhood contribution to black-white
perinatal disparities: an example from two north Carolina counties, 1999–2001. Am. J. Epidemiol.
174(6), 744–752 (2011)
Schmidt-Catran, A.W.: Economic inequality and public demand for redistribution: combining cross-sec-
tional and longitudinal evidence. Socio Econ. Rev. 14(1), 119–140 (2016)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1074
A.Bell et al.
1 3
Schmidt-Catran, A.W., Fairbrother, M.: The random effects in multilevel models: getting them wrong
and getting them right. Eur. Sociol. Rev. 32(1), 23–38 (2015)
Schmidt-Catran, A.W., Spies, D.C.: Immigration and welfare support in germany. Am. Sociol. Rev.
(2016). https ://doi.org/10.1177/00031 22416 63314 0
Schurer, S., Yong, J.: Personality, well-being and the marginal utility of income: what can we learn from
random coefficient models? Working Paper. https ://ideas .repec .org/p/yor/hectd g/12-01.html (2012).
Accessed 28 Apr 2018
Shin, Y., Raudenbush, S.W.: A latent cluster-mean approach to the contextual effects model with missing
data. J. Educ. Behav. Stat. 35(1), 26–53 (2010)
Shor, B., Bafumi, J., Keele, L., Park, D.: A Bayesian multilevel modeling approach to time-series cross-
sectional data. Polit. Anal. 15(2), 165–181 (2007)
Snijders, T.A.B., Bosker, R.J.: Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Mod-
elling, 2nd edn. Sage, London (2012)
Spiegelhalter, D.J.: Incorporating Bayesian ideas into health-care evaluation. Stat. Sci. 19(1), 156–174
(2004)
Steele, F., Vignoles, A., Jenkins, A.: The effect of school resources on pupil attainment: a multilevel simul-
taneous equation modelling approach. J. R. Stat. Soc. Ser. A Stat. Soc. 170, 801–824 (2007)
Stegmueller, D.: How many countries do you need for multilevel modeling? A comparison of frequentist
and Bayesian approaches. Am. J. Polit. Sci. 57(3), 748–761 (2013)
Subramanian, S.V., O’Malley, A.J.: Modeling neighborhood effects the futility of comparing mixed and
marginal approaches. Epidemiology 21(4), 475–478 (2010)
Subramanian, S.V., Jones, K., Kaddour, A., Krieger, N.: Revisiting Robinson: the perils of individualistic
and ecologic fallacy. Int. J. Epidemiol. 38(2), 342–360 (2009)
Vaisey, S., Miles, A.: What you can—and can’t—do with three-wave panel data. Sociol. Methods Res.
46(1), 44–67 (2017)
Vallejo, G., Fernández, P., Cuesta, M., Livacic-Rojas, P.E.: Effects of modeling the heterogeneity on infer-
ences drawn from multilevel designs. Multivar. Behav. Res. 50(1), 75–90 (2015)
Western, B.: Causal heterogeneity in comparative research: a bayesian hierarchical modelling approach.
Am. J. Polit. Sci. 42(4), 1233–1259 (1998)
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA
(2002)
Zhang, Z., Parker, R.M.A., Charlton, C.M.J., Leckie, G., Browne, W.J.: R2MLwiN: a package to run
MLwiN from within R. J. Stat. Softw. 72(10), 1–43 (2016)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:
use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at
onlineservice@springernature.com
... To assess the influence of structural differences between farms and CAP-reforms on crop diversity over time, as well as the temporal trend in general, we apply a Random-effects within-between-model (REWBmodel, also called hybrid-panel-model; Bell et al., 2019). This model allows us to assess the effects of overall differences between farms (between effects) and the effects of changes at the farm level over time (within-effects). ...
... This model allows us to assess the effects of overall differences between farms (between effects) and the effects of changes at the farm level over time (within-effects). While the REWB model in principle is a reparameterisation of the well-established 'Mundlak-model' (Mundlak, 1978), it has only gained interest in recent years (Bell et al., 2019;Schunck, 2013). It should also be noted that so-called fixed-and random-effects models (FE-, respectively RE-model) used in the economic literature represent special cases of the general REWB. ...
... Further, it is possible that the withinand between-effects do not differ. In this case, the more parsimonious RE-model can be used (Bell et al., 2019). To test whether the simpler model can be applied, we follow Bell et al. (2019) and jointly test for equivalence of the between-and within-effects. ...
Article
The diversity of cultivated crops is relevant on various spatial scales, from the field and farm to the landscape. We apply a decomposition of the Shannon diversity index that allows the differentiation of functional diversity of production. The decomposition separates diversity of functional crop groups from related diversity, which shows the species diversity within the crop groups. Using population-based field and farm-level data from Sweden 2001–2018, we are able to study the development of overall (Shannon), functional and related crop diversity among a total of 83770 farms. Crop diversity indices are calculated by farm and year based on the Swedish Land Parcel Identification system (LPIS). We find that functional crop diversity has declined among Swedish farms over the period. Related crop diversity has declined but regained in recent years. Accounting for farm size and pedoclimatic conditions, organic farms have a higher functional diversity, and the uptake of organic practices leads to an increase in functional crop diversity over the period.
... Subsequently, we estimate the number of infections in all centres in which any infections occur at all. 1 Both steps are applied in all the corresponding periods, hence in wave 4a, 4b or 5. Doing so, we use models with a (1) binomial and (2) a negative binomial distribution with quadratic parameterization [22]. For both processes, we apply a random-effect panel model with demeaned data. is allows us to estimate the effects of time-constant variables, between-unit differences and within-unit changes at the same time [23,24]. ...
... When compared to ECEC centres with a share of 0 to 10 % of children from low SES households (reference group), ECEC centres with 60 % or more children from low SES households were nearly twice as likely to report an infection with children in wave 4a. We found this effect decreasing in size with the increasing frequency of infections in wave 4b (red triangle, OR 1. 23 n.s., printed in transparent colours). e same decreasing pattern is also found for the other SES levels in column 1 in Fig. 4. For the number of infections in children (column 2 in Fig. 4), SES does not play a role in wave 4a, but in 4b, and wave 5, but the effects are rather small and do not follow a clear pattern, e.g. the IRR is 1. 16 Fig. ...
Article
Full-text available
Background During the five waves of the SARS-CoV-2 pandemic so far, German early childhood education and care (ECEC) centres implemented various protective measures, such as wearing a face mask, fixed children-staff groups or regular ventilation. In addition, parents and ECEC staff were increasingly vaccinated throughout 2021. During the 4th wave, variant of concern (VOC) Delta-driven transmission indicators reached record values at the end of 2021. Those values were even exceeded in the 5th wave at the beginning of 2022 when Omicron dominated. We examine which factors facilitated or prevented infection with SARS-CoV-2 in ECEC centres, and if these differed between different phases within wave 4 (Delta) and 5 (Omicron). Methods Since August 2020, a weekly online survey among approximately 8000 ECEC managers has been conducted, monitoring both incident SARS-CoV-2 infections and protective measures taken. We included data from calendar week 26/2021 to 05/2022. We estimate the probability of any infections and the number of SARS-CoV-2 infections in children, parents and staff using random-effect-within-between (REWB) panel models for binomial and count data. Results While children, parents and staff of ECEC centres with a high proportion of children from families with low socioeconomic status (SES) have a higher risk of infections in the beginning of wave 4 (OR up to 1.99 [1.56; 2.56]), this effect diminishes for children and parents with rising incidences. Protective measures, such as wearing face masks, tend to have more extensive effects with rising incidences in wave 5 (IRR up to 0.87 [0.8; 0.93]). Further, the protective effect of vaccination against infection among staff is decreasing from wave 4 to wave 5 (OR 0.3 [0.16; 0.55] to OR 0.95, [0.84; 1.07, n.s.]). The degree of transmission from staff to child and from staff to parent is decreasing from wave 4 to wave 5, while transmission from child to staff seems to increase. Conclusion While Omicron seems to affect children and parents from ECEC centres with families with all SES levels more equally than Delta, the protective effect of vaccination against infection is decreasing and the effect of protective measures like face masks becomes increasingly important. In order to prevent massive closures of ECEC centres due to infection of staff, protective measures should be strictly adhered to, especially to protect staff in centres with a high proportion of children from families with low socioeconomic status.
... Moreover, random effect (RE), fixed effect (FE), and generalized method of movement (GMM) analyses were also performed. According to Bell et al. (2019), the mean of a distribution of effects is estimated by a RE and a FE attempt to estimate a single effect that is assumed to be common to all studies. Under the RE model, study weights are more even than under the FE approach. ...
Article
Full-text available
Considering the environmental deterioration challenges and a sharp decline in the quality and quantity of natural resources, the need for capitalizing on renewable energy sources and environment-friendly practices has increased a lot. The United Nations also has urged firms to shift their reliance on fossil fuels to renewable and eco-friendly practices. Several nations have begun shifting from economic growth to green productivity practices, such as green finance, green innovation, etc., which signifies collaboration between the economy, resources, and ecological development to achieve sustainable development goals (SDGs). This study is based on a comprehensive index of green finance development. Using the nonparametric data envelopment analysis and directional distance function (DEA-DFF) model, it examines the impact of green finance, financial development and green technology innovation (GTI) on green total factor productivity (GTFP) in 28 Chinese provinces from 2011 to 2021. The findings indicate that green finance raises the degree of green productivity significantly. Other elements, such as financial development and technological innovation, contribute significantly to green production. It is also found that establishing green finance legislation can help accelerate the growth of green finance. The empirical findings in this study have policy implications for China's environmental and green finance planning. This study is among the pioneer investigations integrating green finance, financial development, and green technology innovation into a unified research framework.
... A random-effects model was used for all analyses regardless of heterogeneity measures, as evidence has shown more robust effect estimates with random-effects models compared to fixed-effects models. 30,31 An analysis of proportions was pooled with a generalised linear mixed model with Clopper-Pearson intervals. 32,33 The generalised linear mixed model is advocated as a statistical approach that better accounts for within-study variation, with the assumption of a binomial likelihood for individual study events. ...
Article
Background The global burden of non-alcoholic fatty liver disease (NAFLD) parallels the increase in obesity rates across the world. Although overweight and obesity status are thought to be an effective indicator for NAFLD screening, the exact prevalence of NAFLD in this population remains unknown. We aimed to report the prevalence of NAFLD, non-alcoholic fatty liver (NAFL), and non-alcoholic steatohepatitis (NASH) in the overweight and obese population. Methods In this systematic review and meta-analysis, we searched Medline and Embase from database inception until March 6, 2022, using search terms including but not limited to “non-alcoholic fatty liver disease”, “overweight”, “obesity”, and “prevalence”. Cross-sectional and longitudinal observational studies published after Jan 1, 2000, written in or translated into English were eligible for inclusion; paediatric studies were excluded. Articles were included if the number of NAFLD, NAFL, or NASH events in an overweight and obese population could be extracted. Summary data were extracted from published reports. The primary outcomes were the prevalence of NAFLD, NAFL, and NASH in an overweight and obese population and the prevalence of fibrosis in individuals who were overweight or obese and who had NAFLD. A meta-analysis of proportions was done with the generalised linear mixed model. This study is registered with PROSPERO (CRD42022344526). Findings The search identified 7389 articles. 151 studies met the inclusion criteria and were included in the meta-analysis. In the pooled analysis comprising 101 028 individuals, the prevalence of NAFLD in the overweight population was 69·99% (95% CI 65·40–74·21 I²=99·10%), the prevalence of NAFL was 42·49% (32·55–53·08, I²=96·40%), and the prevalence of NASH was 33·50% (28·38–39·04, I²=95·60%). Similar prevalence estimates were reported in the obese population for NAFLD (75·27% [95% CI 70·90–79·18]; I²=98·50%), NAFL (43·05% [32·78–53·97]; I²=96·30%) and NASH (33·67% [28·45–39·31]; I²=95·60%). The prevalence of NAFLD in the overweight population was the highest in the region of the Americas (75·34% [95% CI: 67·31–81·93]; I²=99·00%). Clinically significant fibrosis (stages F2–4) was present in 20·27% (95% CI 11·32–33·62; I²= 93·00%) of overweight individuals with NAFLD and in 21·60% (11·47–36·92; I²=95·00%) of obese patients with NAFLD while 6·65% (4·35–10·01; I²=58·00%) of overweight individuals with NAFLD and 6·85% (3·85–11·90; I²=90·00%) of obese individuals with NAFLD had advanced fibrosis (stages F3–4). Interpretation This study summarises the estimated global prevalence of NAFLD, NAFL, and NASH in overweight and obese individuals; these findings are important for improving the understanding of the global NAFLD burden and supporting disease management in the at-risk overweight and obese population. Funding None.
... robust compared to fixed effect models. 12,13 The combined efficacy estimated as proportion of patients having regression together with their 95% confidence intervals (CI) were presented in forest plots. Besides, the relative risk (RR) with corresponding 95% CI was evaluated based on two-arm studies with the control arm of placebo. ...
Preprint
Objectives: We aims to assess the efficacy and safety of therapeutic HPV vaccines to treat cervical intraepithelial neoplasia of grade 2 or 3 (CIN2/3). Design: This study is a systematic review and meta-regression that follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommendations. Data sources: PubMed, Embase, Web of Science, Global Index Medicus and CENTRAL Cochrane were searched up January 31, 2022. Eligibility criteria: Phase II/III studies reporting the efficacy of therapeutic vaccines to achieve regression of CIN2/3 lesions were included. Data extraction and synthesis: Two independent reviewers extracted data, evaluated study quality. A random-effect model was used to pool the proportions of regression and/or HPV clearance. Results: 12 trials met the inclusion criteria. Out of the total 734 women receiving therapeutic HPV vaccine for CIN 2/3, 414 regressed to normal/CIN1 with the overall proportion of regression of 0.54 (95%CI: 0.39, 0.69) for vaccinated group. Correspondingly, 166 women receiving placebo only achieving the pooled normal/CIN1 regression of 0.27 (95%CI: 0.20, 0.34). When only including two-arm studies, the regression proportion of the vaccine group was higher than that of control group (relative risk (RR): 1.52, 95%CI: 1.14, 2.04). Six studies reported the efficacy of the therapeutic vaccines to clear high-risk human papillomavirus (hrHPV) with the pooled proportion of hrHPV clearance of 0.42 (95%CI: 0.32, 0.52) for the vaccine group and 0.17 (95%CI: 0.11, 0.26) for the control group and the RR of 2.03 (95%CI: 1.30, 3.16). Similar results were found regarding HPV16/18 clearance. No significant unsolicited adverse events have been consistently reported. Conclusions: The efficacy of the therapeutic vaccines in the treatment of CIN2/3 was modest. Besides, the implementation issues like feasibility, acceptability, adoption, and cost-effectiveness need to be further studied. PROSPERO registration number: CRD42020189617
... The rationale for choosing these regression models is: Our sample contains panel data, so we run ordinary least square (OLS) regression to test the association between our variables (Winship et al. 2016). We run both fixed and random effects to check the regression for better-informed results (Bell et al., 2019). The random effect is used as suggested by the result of the Hausman test. ...
This paper investigates the relationship between Key audit matters and audit costs and whether board size and independence affect this relationship. We hypothesise that disclosing more KAMs in the audit report is positively associated with audit costs due to the greater effort. The agency theory suggests that firms with good governance will mitigate the agency conflict of interest and improve financial reporting quality. Thus, good governance might moderate the relationship between reported KAMs and audit costs. Using a sample of the UK FTSE all-share non-financial firms from 2014 to 2018, we provide evidence of a significant positive relationship between KAMs and audit costs. The relationship is relatively higher when considering the independent directors' percentage as a moderating factor. These results came consistent with the agency theory literature. However, we found no empirical evidence to support a moderating effect of board size on the relationship between KAMs and audit cost. The paper contributes to the literature assessing the regulatory changes related to audit reform and adds to the debate on the impact on audit costs. Thus, it has theoretical and practical implications for regulators, standard setters, professional bodies, shareholders, and academics.
... We favoured a fixed-effects (FE) model specification in the data analysis, as our interest was focused on the estimation and statistical inference for within-population effects of continuous predictors (hereafter also referred to as "covariates"). This approach accommodates the dependence among observations that may occur when they are clustered into higher-level groups, while also preventing any bias on regression coefficients of lower-level covariates due to omitted variables at the higher level (Allison, 2009;Bell et al., 2019) (see Supplementary Methods 1 for the application of fixed-effects versus random-effects linear models to our data). The SAS 9.4 software (SAS, 2017) was used for data analysis. ...
Article
Full-text available
Selection on plant functional traits may occur through their direct effects on fitness (or a fitness component), or may be mediated by attributes of plant performance which have a direct impact on fitness. Understanding this link is particularly challenging for long-lived organisms, such as forest trees, where lifetime fitness assessments are rarely achievable, and performance features and fitness components are usually quantified from early-life history stages. Accordingly, we studied a cohort of trees from multiple populations of Eucalyptus pauciflora grown in a common-garden field trial established at the hot and dry end of the species distribution on the island of Tasmania, Australia. We related the within-population variation in leaf economic (leaf thickness, leaf area and leaf density) and hydraulic (stomatal density, stomatal length and vein density) traits, measured from two-year-old plants, to two-year growth performance (height and stem diameter) and to a fitness component (seven-year survival). When performance-trait relationships were modelled for all traits simultaneously, statistical support for direct effects on growth performance was only observed for leaf thickness and leaf density. Performance-based estimators of directional selection indicated that individuals with reduced leaf thickness and increased leaf density were favoured. Survival-performance relationships were consistent with size-dependent mortality, with fitness-based selection gradients estimated for performance measures providing evidence for directional selection favouring individuals with faster growth. There was no statistical support for an effect associated with the fitness-based quadratic selection gradient estimated for growth performance. Conditional on a performance measure, fitness-based directional selection gradients estimated for the leaf traits did not provide statistical support for direct effects of the focal traits on tree survival. This suggested that, under the environmental conditions of the trial site and time period covered in the current study, early-stage selection on the studied leaf traits may be mediated by their effects on growth performance, which in turn has a positive direct influence on later-age survival. We discuss the potential mechanistic basis of the direct effects of the focal leaf traits on tree growth, and the relevance of a putative causal pathway of trait effects on fitness through mediation by growth performance in the studied hot and dry environment.
... A recent study assesses the choice decision and highlights the importance of RE over the FE model. It asserts RE may be a better choice for most studies (Bell, Fairbrother, & Jones, 2019), and that the Hausman test is generally misleading. In our study, the brands are very diverse and come from a variety of sectors. ...
Article
Purpose While every firm is striving to embrace digital transformation (DT) to form new differentiating business capabilities, there are dark sides to such initiatives, and it is essential to acknowledge, identify and address them. The purpose of this paper is to identify and emperically demonstrate the impact of such darksides of DT. While a firm's DT effort may have many dark sides, the authors identify data breaches as the most critical one and focus on proving their impact since it can inflict significant damage to the firm. Design/methodology/approach Through the lens of paradox theory, the authors argue that the DT efforts of a firm will lead to increased risk and severity of data breaches. The authors developed a one-of-a-kind longitudinal data set by combining data from multiple sources, including 3604 brands over a 10-year period, and employed a DT performance scorecard to evaluate a firm's DT effort across four key digital selling touchpoints: site, mobile, digital marketing and social media. Findings The findings of this study show that a firm's DT efforts pertaining to its mobile and digital marketing platforms significantly increase the likelihood and severity of a data breach event indicating that these two channels are most vulnerable and need heightened attention from firms. Furthermore, the findings suggest that the negative repercussions of some DT initiatives may be minimized as the firm becomes more innovative. The findings can help firms re-strategize their DT efforts by promoting security and also encouraging a balanced communication strategy. Originality/value This research is one of the first to identify, recognize and empirically illustrate the downsides of a DT effort that is otherwise thought to provide only benefits.
Article
An innovative pilot project to facilitate the transparent transfer of rental rights for publicly owned agricultural land via an ascending-price online auction was launched in Ukraine in October 2018. This paper analyses publicly disclosed auction data and investigates how competition, auction design characteristics, and farmland-specific properties influenced the auction outcomes. This information is factored into the probability of the plot being rented (i.e. auction success) and the size of the winning bid (i.e. rental rates). The analysis was conducted using an independent private values framework, employing a mixed-effects model with sample selection. Estimation results confirmed that a higher number of bidders and more active bidding lead to a significantly greater probability of auction success and higher rental rates.
Article
Mobile connectivity can negatively affect smartphone users by eliciting stress. Past research focused on stress-inducing potentials of smartphone use behaviors and, recently, on the cognitive-motivational engagement with online interactions. However, theoretical perspectives as the mobile connectivity paradox and the IM³UNE model further suggest that digital stress effects may be conditional. A preregistered experience sampling study ( n = 123; 1,427 use episodes) investigated relationships of cognitive-motivational (online vigilance) and behavioral (communication load, media multitasking) smartphone use patterns with perceived stress and introduced two situational boundary conditions (goal conflict, autonomy need dissatisfaction). Results demonstrate that online vigilance can induce stress directly and via increasing communication load. Goal conflict and autonomy need dissatisfaction moderated the influence of online vigilance and media multitasking on stress. Findings are discussed in the context of effect directionality and the need to further investigate boundary conditions in digital well-being research.
Article
Full-text available
Multilevel models have recently been used to empirically investigate the idea that social characteristics are intersectional such as age, sex, ethnicity, and socioeconomic position interact with each other to drive outcomes. Some argue this approach solves the multiple-testing problem found in standard dummy-variable (fixed-effects) regression, because intersectional effects are automatically shrunk towards their mean. The hope is intersections appearing statistically significant by chance in a fixed-effects regression will not appear so in a multilevel model. However, this requires assumptions that are likely to be broken. We use simulations to show the effect of breaking these assumptions: when there are true main effects/interactions, un-modeled in the fixed part of the model. We show,whilst the multilevel approach outperforms the fixed-effects approach, shrinkage is less than is desired, and some intersectional effects are likely to appear erroneously statistically significant by chance. We conclude with advice to make this promising method work robustly.
Preprint
Full-text available
For static panel data models that include endogenous time-invariant variables correlated with individual effects, exogenous averages over time of time-varying variables can be internal instruments. To pretest their exogeneity, we first estimate a random effects model that includes all averages over time of time-varying variables (Mundlak, 1978; Krishnakumar, 2006). Internal instruments are then selected if their parameter is statistically different from zero (Mundlak, 1978; Hausman and Taylor, 1981). Finally, we estimate a Hausman-Taylor (1981) model using these internal instruments. We then evaluate the biases of currently used alternative estimators in a Monte-Carlo simulation: repeated between, ordinary least squares, two-stage restricted between, Oaxaca-Geisler estimator, fixed effect vector decomposition, and random effects (restricted generalized least squares).
Article
Full-text available
Cet ouvrage présente des avancées récentes dans le traitement statistique des données introduisant plusieurs niveaux d'agrégation. La partie la plus intéressante aborde les modèles biographiques dans une optique multiniveau, mais pose de nombreux problèmes en démographie. Il faut pouvoir disposer d'enquêtes plus détaillées que les enquêtes biographiques habituelles, qui prendraient simultanément en compte les divers contextes sociaux dans les quels les individus vivent.
Article
Full-text available
Kelley at al. argue that group-mean-centering covariates in multilevel models is dangerous, since— they claim—it generates results that are biased and misleading. We argue instead that what is dangerous is Kelley et al.'s unjustified assault on a simple statistical procedure that is enormously helpful, if not vital, in analyses of multilevel data. Kelley et al.'s arguments appear to be based on a faulty algebraic operation, and on a simplistic argument that parameter estimates from models with mean-centered covariates must be wrong merely because they are different than those from models with uncentered covariates. They also fail to explain why researchers should dispense with mean-centering when it is central to the estimation of fixed effects models—a common alternative approach to the analysis of clustered data, albeit one increasingly incorporated within a random effects framework. Group-mean-centering is, in short, no more dangerous than any other statistical procedure, and should remain a normal part of multilevel data analyses where it can be judiciously employed to good effect.
Article
Full-text available
Studies on small sample properties of multilevel models have become increasingly prominent in the methodological literature in response to the frequency with which small sample data appear in empirical studies. Simulation results generally recommend that empirical researchers employ restricted maximum likelihood estimation (REML) with a Kenward-Roger correction with small samples in frequentist contexts to minimize small sample bias in estimation and to prevent inflation of Type-I error rates. However, simulation studies focus on recommendations for best practice and there is little to no explanation of why traditional maximum likelihood (ML) breaks down with smaller samples, what differentiates REML from ML, or how the Kenward-Roger correction remedies lingering small sample issues. Due to the complexity of these methods, most extant descriptions are highly mathematical and are intended to prove that the methods improve small sample performance as intended. Thus, empirical researchers have documentation that these methods are advantageous but still lack resources to help understand what the methods actually do and why they are needed. This tutorial explains why ML falters with small samples, how REML circumvents some issues, and how Kenward-Roger works. We do so without equations or derivations to support more widespread understanding and use of these valuable methods.
Article
This study tests the links between economic performance, democratic quality and satisfaction with democracy (SWD) at multiple levels. By analysing a time-series cross-sectional (TSCS) dataset of 57 democracies between 1990 and 2014, it finds the two types of performance matter almost equally: countries with good democratic and economic records tend to show higher levels of SWD than countries without them. Over time, an improvement in ‘objective’ democratic and economic conditions is shown to be related to increasing levels of national SWD. The second part of the study reconfirms these relationships at the individual level by analysing survey data from the “Europeans' understandings and evaluations of democracy” special module of the sixth round of the European Social Survey. It shows that respondents' evaluations of the economy and democracy are strongly related to their SWD. Finally, it demonstrates that the effect of objective democratic and economic performances on SWD is mediated by peoples' evaluations of them.